Greetings,

I have Python 3.6 script on Windows to scrape comment history from a website. It's currently set up this way:

Requestor (threads) -> list -> Parser (threads) -> queue -> CVSWriter (single thread)

It takes 15 minutes to process ~11,000 comments.

When I replaced the list with a queue between the Requestor and Parser to speed up things, BeautifulSoup stopped working.

When I changed BeautifulSoup(contents, "lxml") to BeautifulSoup(contents), I get the UserWarning that no parser wasn't explicitly set and a reference to line 80 in threading.py (which puts it in the RLock factory function).

When I switched back to using list between the Requestor and Parser, the Parser worked again.

BeautifulSoup doesn't work with a threaded input queue?

Thank you,

Chris Reimer

--
https://mail.python.org/mailman/listinfo/python-list

Reply via email to