Hello everyone, and thanks for your time to read this. For quite some time, I have had a problem using Python's shell execution facilities in combination with a cluster computer environment (such as Sun Grid Engine (SGE)). In particular, I wish to repeatedly execute a number of commands in sub-shells or pipes within a single function, and the repeated execution is depending on the previous execution, so just writing a brute force script file and executing commands is not an option for me.
To isolate and exemplify my problem, I have created three files: (1) one which exemplifies the spirit of the code I wish to execute in Python (2) one which serves as the SGE execution script file, and actually calls python to execute the code in (1) (3) a simple shell script which executes (2) a sufficient number of times that it fills all processors on my computing cluster and leaves an additional number of jobs in the queue. Here is the spirit of the experiment/problem: generateTest.py: ---------------------------------------------- # Constants numParallelJobs = 100 testCommand = "continue" #"os.popen( \"clear\" )" loopSize = "1000" # First, write file with test script. pythonScript = file( "testScript.py", "w" ) pythonScript.write( """ import os for i in range( 0, """ + loopSize + """ ): for j in range( 0, """ + loopSize + """ ): for k in range( 0, """ + loopSize + """ ): for l in range( 0, """ + loopSize + """ ): """ + testCommand + """ """ ) pythonScript.close() # Second, write SGE script file to execute the Python script. sgeScript = file( "testScript.sge", "w" ) sgeScript.write ( """ #$ -cwd #$ -N pythonTest #$ -e /export/home/jbbrown/errorLog #$ -o /export/home/jbbrown/outputLog python testScript.py """ ) sgeScript.close() # Finally, write script to run SGE script a specified number of times. import os launchScript = file( "testScript.sh", "w" ) for i in range( 0, numParallelJobs ): launchScript.write( "qsub testScript.sge" + os.linesep ) launchScript.close() ---------------------------------------------- Now, let's assume that I have about 50 processors available across 8 compute nodes, with one NFS-mounted disk. If I run the code as above, simply executing Python "continue" statements and do nothing, the cluster head node reports no serious NFS daemon load. However - if I change the code to use the os.popen() call shown as a comment above, or use os.system(), the NFS daemon load on my system skyrockets within seconds of distributing the jobs to the compute nodes -- even though I'm doing nothing but executing the clear screen command, which technically doesn't pipe any output to the location for logging stdout. Even if I change the SGE script file to redirect standard output and error to explicitly go to /dev/null, I still have the same problem. I believe the source of this problem is that os.popen() or os.system() calls spawn subshells which then reference my shell resource files (.zshrc, .cshrc, .bashrc, etc.). But I don't see an alternative to os.popen{234} or os.system(). os.exec*() cannot solve my problem, because it transfers execution to that program and stops executing the script which called os.exec*(). Without having to rewrite a considerable amount of code (which performs cross validation by repeatedly executing in a subshell) in terms of a shell script language filled with a large number of conditional statements, does anyone know of a way to execute external programs in the middle of a script without referencing the shell resource file located on an NFS mounted directory? I have read through the >help(os) documentation repeatedly, but just can't find a solution. Even a small lead or thought would be greatly appreciated. With thanks from humid Kyoto, J.B. Brown -- http://mail.python.org/mailman/listinfo/python-list