Hi John

I'm afraid that the straightforward approach you're trying isn't going to
work with Open MPI in its current implementation. I had plans for supporting
this kind of operation, but....not happening. And as you discovered, you
cannot run mpiexec/mpirun in the background, and the "do-not-wait" option
doesn't work (may even be turned "off" by now, depending which version you
are using).

Your best bet would be to put a call in your first executable to "spawn" the
second executable. You don't need to do this via MPI - you can do it
directly from a non-MPI program by calling the appropriate RTE function.
Several OpenRTE (the RTE underneath Open MPI) users do this regularly,
myself included.

I don't know what version you are using, but assuming it is 1.2 or the
"trunk", you will find an example of this in a test program in
orte/test/system/orte_spawn.c. I can provide advice/details on how to make
this work, if needed (probably best done off-list, or use the OpenRTE
mailing lists - see http://www.open-rte.org).

Ralph



On 4/23/07 11:18 PM, "John Borchardt" <john.borcha...@gmail.com> wrote:

> Greetings,
> 
> I was hoping someone could help me with the following situation.  I have a
> program which has no MPI support that I'd like to run "in parallel" by running
> a portion of my total task on N CPUs of a PBS/Maui/Open-MPI cluster.  (The
> algorithm is such that there is no real need for MPI, I am just as well-off
> running N processes on N CPUs as I would be adding MPI support to my program
> and then running on N CPUs.)
> 
> So it's easy enough to set up a Perl script to submit N jobs to the queue to
> run on N nodes.  But, my cluster has two CPUs per node, and I am not
> RAM-limited, so I'd like to run two serial jobs per node, one on each node
> CPU.  From what my admin tells me, I must use the mpiexec command to run my
> program so that the scheduler knows to run my program on the nodes which it
> has assigned to me.
> 
> In my PBS script (this is one of N/2 similar scripts),
> 
> #!/bin/bash
> #PBS -l nodes=1:ppn=2
> #PBS -l walltime=1:00:00:00
> mpiexec -pernode program-executable<inputfile1>outputfile1
> mpiexec -pernode program-executable<inputfile2>outputfile2
> 
> does not have the desired effect.  It appears that (1) the second process
> waits for the first to finish, and (2) MPI or the scheduler (I can't tell
> which) tries to re-start the program a few times (you can see this in the
> output files).  Adding an ampersand to the first mpiexec line appears to cause
> mpiexec to crash and the job does not run at all.  Using:
> 
> mpiexec -np 1 program-executable<inputfile>outputfile
> 
> avoids the strange re-start problem I mentioned above, but of course does not
> use both CPUs on a node.
> 
> 
> Maybe I am making a simple mistake, but I am quite new to cluster computing...
> Any help you can offer is greatly appreciated!
> 
> 
> Thanks,
> 
> --John Borchardt
> 
> 
> 
> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users



Reply via email to