robean <st1...@gmail.com> writes: > reach the urls with urllib2. The actual program will involve fairly > elaborate scraping and parsing (I'm using Beautiful Soup for that) but > the example shown here is simplified and just confirms the url of the > site visited.
Keep in mind Beautiful Soup is pretty slow, so if you're doing a lot of pages and have multiple cpu's, you probably want parallel processes rather than threads. > wrong? I am new to both threading and urllib2, so its possible that > the SNAFU is quite obvious.. > ... > ulock = threading.Lock() Without looking at the code for more than a few seconds, using an explicit lock like that is generally not a good sign. The usual Python style is to send all inter-thread communications through Queues. You'd dump all your url's into a queue and have a bunch of worker threads getting items off the queue and processing them. This really avoids a lot of lock-related headache. The price is that you sometimes use more threads than strictly necessary. Unless it's a LOT of extra threads, it's usually not worth the hassle of messing with locks. -- http://mail.python.org/mailman/listinfo/python-list