Hi, Thanks for the answer. I use Linux with CPython 2.7. I plan to work with CPU bound and I/O bound problems too. Which packages to use in these cases? Could you redirect me to some guides? When to use multiprocessing / gevent?
Thanks, Laszlo On Thu, May 10, 2012 at 2:34 PM, Dave Angel <d...@davea.name> wrote: > On 05/10/2012 08:14 AM, Jabba Laci wrote: >> Hi, >> >> I would like to do some parallel programming with Python but I don't >> know how to start. There are several ways to go but I don't know what >> the differences are between them: threads, multiprocessing, gevent, >> etc. >> >> I want to use a single machine with several cores. I want to solve >> problems like this: iterate over a loop (with millions of steps) and >> do some work at each step. The steps are independent, so here I would >> like to process several steps in parallel. I want to store the results >> in a global list (which should be "synchronised"). Typical use case: >> crawl webpages, extract images and collect the images in a list. >> >> What's the best way? >> >> Thanks, >> >> Laszlo > > There's no single best-way. First question is your programming > environment. That includes the OS you're running, and the version # and > implementation of Python. > > I'll assume you're using CPython 2.7 on Linux, which is what I have the > most experience on. But after you answer, others will probably make > suggestions appropriate to whatever you're actually using > > Next question is whether the problem you're solving at any given moment > is cpu-bound or i/o bound. I'll try to answer for both cases, here. > > CPU-bound: > In CPython 2.7, there's a GIL, which is a global lock preventing more > than one CPU-bound thread from running at the same time. it's more > complex than that, but bottom line is that multiple threads won't help > (and might hurt) a CPU-bound program, even in a multi-core situation. > So use multiple processes, and cooperate between them with queues or > shared memory, or even files. In fact, you can use multiple computers, > and communicate using sockets, in many cases. > > IO-bound: > This is what CPython is good at solving with threads. Once you make a > blocking I/O call, usually the C code involves releases the GIL, and > other threads can run. For this situation, the fact that you can share > data structures makes threads a performance win. > > Web crawling is likely to be IO-bound, but i wanted to be as complete as > I could. > > -- > > DaveA > -- http://mail.python.org/mailman/listinfo/python-list