Greetings,

I was hoping someone could help me with the following situation.  I have a
program which has no MPI support that I'd like to run "in parallel" by
running a portion of my total task on N CPUs of a PBS/Maui/Open-MPI
cluster.  (The algorithm is such that there is no real need for MPI, I am
just as well-off running N processes on N CPUs as I would be adding MPI
support to my program and then running on N CPUs.)

So it's easy enough to set up a Perl script to submit N jobs to the queue to
run on N nodes.  But, my cluster has two CPUs per node, and I am not
RAM-limited, so I'd like to run two serial jobs per node, one on each node
CPU.  From what my admin tells me, I must use the mpiexec command to run my
program so that the scheduler knows to run my program on the nodes which it
has assigned to me.

In my PBS script (this is one of N/2 similar scripts),

#!/bin/bash
#PBS -l nodes=1:ppn=2
#PBS -l walltime=1:00:00:00
mpiexec -pernode program-executable<inputfile1>outputfile1
mpiexec -pernode program-executable<inputfile2>outputfile2

does not have the desired effect.  It appears that (1) the second process
waits for the first to finish, and (2) MPI or the scheduler (I can't tell
which) tries to re-start the program a few times (you can see this in the
output files).  Adding an ampersand to the first mpiexec line appears to
cause mpiexec to crash and the job does not run at all.  Using:

mpiexec -np 1 program-executable<inputfile>outputfile

avoids the strange re-start problem I mentioned above, but of course does
not use both CPUs on a node.


Maybe I am making a simple mistake, but I am quite new to cluster
computing...  Any help you can offer is greatly appreciated!


Thanks,

--John Borchardt

Reply via email to