grocery_stocker wrote:
Let's say there is a new zip file with updated information every 30
minutes on a remote website. Now, I wanna connect to this website
every 30 minutes, download the file, extract the information, and then
have the program search the file search for certain items.

Would it be better to use threads to break this up? I have one thread
download the data and then have another to actually process the data .
Or would it be better to use fork?


I concur with Diez that I don't think threading/forking will bring significant advantages for this particular case.

That said, if you are thinking from a responsiveness perspective, I would definitely say threading.

If you ask from a performance perspective I would need to know what OS you are running (that is is if forking is even supported) and if you have multiple CPU's and if you are actually planning on spawning that sub-process on a (possibly) a different CPU as the parent process.

So the workflow would be something like this:
Downloading block
If block has enough data to process, spawn a new process (using multiprocessing module) and let it write the result back to x (requiring lock and release).

Things to keep in mind, is the overhead of:
- Multiple interpreters running on the multiple CPU's
- IPC
- Locking/Releasing
Still less then if you would have no threading at all?

About forking, this usually means that the child process starts out as an exact copy of the parent process and runs ideally mostly independent of the parent meaning that the best case would be that the child process can run fine without the presents of the parent process, is this really what you want to do?

--
MPH
http://blog.dcuktec.com
'If consumed, best digested with added seasoning to own preference.'
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to