On 2019-05-23 22:41, Avi Gross via Python-list wrote:
Bob,

As others have noted, you have not made it clear how what you are doing is
running "in parallel."

I have a similar need where I have thousands of folders and need to do an
analysis based on the contents of one at a time and have 8 cores available
but the process may run for months if run linearly. The results are placed
within the same folder so each part can run independently as long as shared
resources like memory are not abused.

Your need is conceptually simple. Break up the list of filenames into N
batches of about equal length. A simple approach might be to open N terminal
or command windows and in each one start a python interpreter by hand
running the same program which gets one of the file lists and works on it.
Some may finish way ahead of others, of course. If anything they do writes
to shared resources such as log files, you may want to be careful. And there
is no guarantee that several will not run on the same CPU. There is also
plenty of overhead associated with running full processes. I am not
suggesting this but it is fairly easy to do and may get you enough speedup.
But since you only seem to need a few minutes, this won't be much.

Quite a few other solutions involve using some form of threads running
within a process perhaps using a queue manager. Python has multiple ways to
do this. You would simply feed all the info needed (file names in your case)
to a thread that manages a queue. It would allow up to N threads to be
started and whenever one finishes, would be woken to start a replacement
till done. Unless one such thread takes very long, they should all finish
reasonably close to each other. Again, lots of details to make sure the
threads do not conflict with each other. But, no guarantee which core they
get unless you use an underlying package that manages that.

[snip]

Because of the GIL, only 1 Python thread will actually be running at any time, so if it's processor-intensive, it's better to use multiprocessing.

Of course, if it's already maxing out the disk, then using more cores won't make it faster.
--
https://mail.python.org/mailman/listinfo/python-list

Reply via email to