Re: BeautifulSoup doesn't work with a threaded input queue?

Christopher Reimer via Python-list Sun, 27 Aug 2017 14:18:00 -0700

On 8/27/2017 1:31 PM, Peter Otten wrote:

Here's a simple example that extracts titles from generated html. It seems
to work. Does it resemble what you do?

Your example is similar to my code when I'm using a list for the inputto the parser. You have soup_threads and write_threads, but no read_threads.

The particular website I'm scraping requires checking each page for thesentinel value (i.e., "Sorry, no more comments") in order to determinewhen to stop requesting pages. For my comment history that's ~750 pagesto parse ~11,000 comments.

I have 20 read_threads requesting and putting pages into the outputqueue that is the input_queue for the parser. My soup_threads can getitems from the queue, but BeautifulSoup doesn't do anything after that.


Chris R.
--
https://mail.python.org/mailman/listinfo/python-list

Re: BeautifulSoup doesn't work with a threaded input queue?

Reply via email to