Steve Holden wrote:
John Nagle wrote:
Paul Rubin wrote:
John Nagle <na...@animats.com> writes:
Analysis of each domain is
performed in a separate process, but each process uses multiple
threads to read process several web pages simultaneously.

   Some of the threads go compute-bound for a second or two at a time as
they parse web pages.
You're probably better off using separate processes for the different
pages.  If I remember, you were using BeautifulSoup, which while very
cool, is pretty doggone slow for use on large volumes of pages.  I don't
know if there's much that can be done about that without going off on a
fairly messy C or C++ coding adventure.  Maybe someday someone will do
that.
   I already use separate processes for different domains.  I could
live with Python's GIL as long as moving to a multicore server
doesn't make performance worse.  That's why I asked about CPU dedication
for each process, to avoid thrashing at the GIL.

I believe it's already been said that the GIL thrashing is mostly MacOS
specific. You might also find something in the affinity module

   No, the original analysis was MacOS oriented, but the same mechanism
applies for fighting over the GIL on all platforms.  There was some
pontification that it might be a MacOS-only issue, but no facts
were presented.  It might be cheaper on C implementations with mutexes
that don't make system calls for the non-blocking cases.

                                        John Nagle
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to