Hi all... I've written a class to provide an interface to popen; I've included the actual select() loop below. I'm finding that "sometimes" popen'd processes take "a really long time" to complete and "other times" I get incomplete stdout.
E.g: - on boxA ffmpeg returns in ~25s; on boxB (comparable hardware, identical OS) ~5m. - ``ls'' on a directory with 15 nodes returns full stdout; ``ls -R'' on that same directory (with ~32K nodes beneath) stops after 4097KB of output. The code in question is running on Linux 2.6.x; no cross-platform portability desired. popen'd commands will never be interactive; I just wanna read stdin/stdout and perhaps feed a one-shot string via stdin. Here's the relevent code (stripped of comments and various OO setup/output stuff): # # ## ### ##### ######## ############# ##################### # cut here def run(self): import os, select, syslog (_stdin, _stdout, _stderr) = os.popen3(self.command) stdoutChunks = []; stderrChunks = [] readList = [_stdout, _stderr]; if self.stdinString is not "": writeList = [_stdin] else: writeList = [] readStderr = False; readStdout = False i = 0 while True: i += 1 (r, w, x) = select.select(readList, writeList, [], 1) read = "" if self.stdinString is not "": if w: bytesWritten = os.write(_stdin.fileno(), self.stdinString) writeList.remove(_stdin) _stdin.close() continue if r: if _stderr in r: readStderr = True read = os.read(_stderr.fileno(), 16384) if read: stderrChunks.append(read) else: readList.remove(_stderr) continue elif _stdout in r: readStdout = True read = os.read(_stdout.fileno(), 16384) if read: stdoutChunks.append(read) syslog.syslog("Command instance read %d from stdout" % len(read)) else: readList.remove(_stdout) continue else: if \ (readStderr and self.dieOnStderr) \ or \ readStdout: syslog.syslog("Command instance finished") break return # cut here # # ## ### ##### ######## ############# ##################### Tweaking (a) the os.read() buffer size and (b) the select() timeout and testing with ``ls -R'' on a directory with ~ 32K nodes beneath, I find the following trends: 1. With a very small os.read() buffer, I get full stdout, but running time is rather long. Running time increases as select() timeout increases. 2. With a very large os.read() buffer, I get incomplete stdout (but running time is *very* fast). As select() timeout increases, I get better and better results - with a select() timeout of 0.2 I seem to get reliably full stdout. The values used in the code I've pasted above - large buffer, large select() timeout - seem to perform "well enough"; none of the previously described problems manifest. However, ``ls -lR /'' (way more than 32K nodes) "sometimes" gives incomplete stdout. My first question, then, is paranoid: I've run all these benchmarks because the application using this code saw a HUGE performance hit when we started using popen'd commands which generated "lots of" output. Is there anything wrong with the logic in my code?! Will I see severe performance degradation (or worse, incomplete stdout/stderr) as system variables change (e.g. system load increases, popen'd program changes, popen'd program increases workload, etc.)? Next question - how do I tune the select() timeout and the os.read() buffer correctly? Is it *really* per- command, per- system, per- phase-of-moon voodoo? Is there a Reccommended Setup for such a select() loop? Thanks in advance, for insight as well as for tolerating my long-windedness... -- Christopher DeMarco <[EMAIL PROTECTED]> Alephant Systems (http://alephant.net) PGP public key at http://pgp.alephant.net +1-412-708-9660
signature.asc
Description: Digital signature
-- http://mail.python.org/mailman/listinfo/python-list