Greetings, I was hoping someone could help me with the following situation. I have a program which has no MPI support that I'd like to run "in parallel" by running a portion of my total task on N CPUs of a PBS/Maui/Open-MPI cluster. (The algorithm is such that there is no real need for MPI, I am just as well-off running N processes on N CPUs as I would be adding MPI support to my program and then running on N CPUs.)
So it's easy enough to set up a Perl script to submit N jobs to the queue to run on N nodes. But, my cluster has two CPUs per node, and I am not RAM-limited, so I'd like to run two serial jobs per node, one on each node CPU. From what my admin tells me, I must use the mpiexec command to run my program so that the scheduler knows to run my program on the nodes which it has assigned to me. In my PBS script (this is one of N/2 similar scripts), #!/bin/bash #PBS -l nodes=1:ppn=2 #PBS -l walltime=1:00:00:00 mpiexec -pernode program-executable<inputfile1>outputfile1 mpiexec -pernode program-executable<inputfile2>outputfile2 does not have the desired effect. It appears that (1) the second process waits for the first to finish, and (2) MPI or the scheduler (I can't tell which) tries to re-start the program a few times (you can see this in the output files). Adding an ampersand to the first mpiexec line appears to cause mpiexec to crash and the job does not run at all. Using: mpiexec -np 1 program-executable<inputfile>outputfile avoids the strange re-start problem I mentioned above, but of course does not use both CPUs on a node. Maybe I am making a simple mistake, but I am quite new to cluster computing... Any help you can offer is greatly appreciated! Thanks, --John Borchardt