Bob, As others have noted, you have not made it clear how what you are doing is running "in parallel."
I have a similar need where I have thousands of folders and need to do an analysis based on the contents of one at a time and have 8 cores available but the process may run for months if run linearly. The results are placed within the same folder so each part can run independently as long as shared resources like memory are not abused. Your need is conceptually simple. Break up the list of filenames into N batches of about equal length. A simple approach might be to open N terminal or command windows and in each one start a python interpreter by hand running the same program which gets one of the file lists and works on it. Some may finish way ahead of others, of course. If anything they do writes to shared resources such as log files, you may want to be careful. And there is no guarantee that several will not run on the same CPU. There is also plenty of overhead associated with running full processes. I am not suggesting this but it is fairly easy to do and may get you enough speedup. But since you only seem to need a few minutes, this won't be much. Quite a few other solutions involve using some form of threads running within a process perhaps using a queue manager. Python has multiple ways to do this. You would simply feed all the info needed (file names in your case) to a thread that manages a queue. It would allow up to N threads to be started and whenever one finishes, would be woken to start a replacement till done. Unless one such thread takes very long, they should all finish reasonably close to each other. Again, lots of details to make sure the threads do not conflict with each other. But, no guarantee which core they get unless you use an underlying package that manages that. So you might want to research available packages that do much of the work for you and provide some guarantees. An interesting question is how to set the chosen value of N. Just because you have N cores, you do not necessarily choose N. There are other things happening on the same machine with sometimes thousands of processes or threads in the queue even when the machine is sitting there effectively doing nothing. If you will also keep multiple things open (mailer, WORD, browsers, ...) you need some bandwidth so everything else gets enough attention. So is N-1 or N-2 better? Then again, if your task has a mix of CPU and I/O activities then it may make sense to run more than N in parallel even if several of them end up on the same CORE as they may interleave with each other and one make use of the CPU while the others are waiting on I/O or anything slower. I am curious to hear what you end up with. I will be reading to see if others can point to modules that already support something like this with you supplying just a function to use for each thread. I suggest you consider your architecture carefully. Sometimes it is better to run program A (in Python or anything else) that sets up what is needed including saving various data structures on disk needed for each individual run. Then you start the program that reads from the above and does the parallel computations and again writes out what is needed such as log entries, or data in a CSV. Finally, when it is all done, another program can gather in the various outputs and produce a consolidated set of info. That may be extra work but minimizes the chance of the processes interfering with each other. It also may allow you to run or re-run smaller batches or even to farm out the work to other machines. If you create a few thousand directories (or just files) with names like do0001 then you can copy them to another machine where you ask it to work on do0* and yet another on do1* and so on, using the same script. This makes more sense for my project which literally may take months or years if run exhaustively on something like a grid search trying huge numbers of combinations. Good luck. Avi -----Original Message----- From: Python-list <python-list-bounces+avigross=verizon....@python.org> On Behalf Of Bob van der Poel Sent: Thursday, May 23, 2019 2:40 PM To: Python <python-list@python.org> Subject: More CPUs doen't equal more speed I've got a short script that loops though a number of files and processes them one at a time. I had a bit of time today and figured I'd rewrite the script to process the files 4 at a time by using 4 different instances of python. My basic loop is: for i in range(0, len(filelist), CPU_COUNT): for z in range(i, i+CPU_COUNT): doit( filelist[z]) With the function doit() calling up the program to do the lifting. Setting CPU_COUNT to 1 or 5 (I have 6 cores) makes no difference in total speed. I'm processing about 1200 files and my total duration is around 2 minutes. No matter how many cores I use the total is within a 5 second range. This is not a big deal ... but I really thought that throwing more processors at a problem was a wonderful thing :) I figure that the cost of loading the python libraries and my source file and writing it out are pretty much i/o bound, but that is just a guess. Maybe I need to set my sights on bigger, slower programs to see a difference :) -- **** Listen to my FREE CD at http://www.mellowood.ca/music/cedars **** Bob van der Poel ** Wynndel, British Columbia, CANADA ** EMAIL: b...@mellowood.ca WWW: http://www.mellowood.ca -- https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list