On 5/23/19 6:32 PM, Cameron Simpson wrote:
On 23May2019 17:04, bvdp <b...@mellowood.ca> wrote:
Anyway, yes the problem is that I was naively using command.getoutput()
which blocks until the command is finished. So, of course, only one
process
was being run at one time! Bad me!
I guess I should be looking at subprocess.Popen(). Now, a more relevant
question ... if I do it this way I then need to poll though a list of
saved
process IDs to see which have finished? Right? My initial thought is to
batch them up in small groups (say CPU_COUNT-1) and wait for that
batch to
finish, etc. Would it be foolish to send send a large number (1200 in
this
case since this is the number of files) and let the OS worry about
scheduling and have my program poll 1200 IDs?
Someone mentioned the GIL. If I launch separate processes then I don't
encounter this issue? Right?
Yes, but it becomes more painful to manage. If you're issues distinct
separate commands anyway, dispatch many or all and then wait for them as
a distinct step. If the commands start thrashing the rest of the OS
resources (such as the disc) then you may want to do some capacity
limitation, such as a counter or semaphore to limit how many go at once.
Now, waiting for a subcommand can be done in a few ways.
If you're then parent of all the processes you can keep a set() of the
issued process ids and then call os.wait() repeatedly, which returns the
pid of a completed child process. Check it against your set. If you need
to act on the specific process, use a dict to map pids to some record of
the subprocess.
Alternatively, you can spawn a Python Thread for each subcommand, have
the Thread dispatch the subcommand _and_ wait for it (i.e. keep your
command.getoutput() method, but in a Thread). Main programme waits for
the Threads by join()ing them.
I'll just note, because no one else has brought it up yet, that rather
than manually creating threads and/or process pools for all these
things, this is exactly what the standard concurrent.futures module is
for. It's a fairly brilliant wrapper around all this stuff, and I feel
like it often doesn't get enough love.
--
Rob Gaddi, Highland Technology -- www.highlandtechnology.com
Email address domain is currently out of order. See above to fix.
--
https://mail.python.org/mailman/listinfo/python-list