Currently, I am in the process of converting an MPMD program of mine from
LAM to OpenMPI.  The old LAM setup used an application schema to handle the
launching of the server and remote processes on all the nodes in the
cluster; however, I have run into an issue due to the difference in how
mpirun works in both.  Because mpirun will route STDIN and STDOUT on remote
processes to the location of STDIN and STDOUT where mpirun was originally
run, I use a shell to launch the remote processes on all the nodes.  In
other words, I have mpirun start a shell (/bin/sh) on all the nodes and pass
to it a string of runtime variables to be passed into the executable that is
started by the shell.  By using the shell’s “-c” option, I can start a
process this way and it allows me to control the STDIN and STDOUT of the
process.  Below is an example application schema that works in LAM but not
OpenMPI (obviously the –mca option doesn’t exist in LAM but its equivalence
would).   When trying to use this below in OpenMPI, I get EOF file parsing
errors because OpenMPI does not know what to do with the variables listed in
the quotations.  It will parse the first quote, the program and its path,
then errors trying to look for a matching quote when it should have kept on
reading in all the runtime variables located in this string.  How do I get
this entire string to be passed by mpirun so that the shell can execute the
corresponding process and pass the associated runtime variables to it.


#Example Application Schema

#server

-host node1 --mca btl tcp,self –np 1 /bin/sh –c “/usr/bin/SERVER_PROG
--varTen blah” > myOwnLogfile_server.log 2>&1”

#node2

-host node2 --mca btl tcp,self -np 1 /bin/sh -c “/usr/bin/REMOTE_PROG
--varOne 59339 --varTwo 65888” > myOwnLogfile_remote1.log 2>&1”

#node3

-host node3 --mca btl tcp,self -np 1 /bin/sh -c “/usr/bin/REMOTE_PROG
--varOne 59339 --varTwo 65888” > myOwnLogfile_remote2.log 2>&1”

#node4

-host node4 --mca btl tcp,self -np 1 /bin/sh -c “/usr/bin/REMOTE_PROG
--varOne 59339 --varTwo 65888” > myOwnLogfile_remote3.log 2>&1”

Reply via email to