"html5lib" is apparently not thread safe. (see "http://code.google.com/p/html5lib/issues/detail?id=189") Looking at the code, I've only found about three problems. They're all the usual "cached in a global without locking" bug. A few locks would fix that.
But html5lib calls the XML SAX parser. Is that thread-safe? Or is there more trouble down at the bottom? (I run a multi-threaded web crawler, and currently use BeautifulSoup, which is thread safe, although dated. I'm looking at converting to html5lib.) John Nagle -- http://mail.python.org/mailman/listinfo/python-list