On Mar 5, 2010, at 2:38 PM, Ralph Castain wrote: >> CALL SYSTEM("cd " // TRIM(dir) // " ; mpirun -machinefile ./machinefile -np >> 1 /home01/group/Execute/DLPOLY.X > job.out 2> job.err ; cd - > /dev/null") > > That is guaranteed not to work. The problem is that mpirun sets environmental > variables for the original launch. Your system call carries over those > envars, causing mpirun to become confused.
You should be able to use MPI_COMM_SPAWN to launch this MPI job. Check the man page for MPI_COMM_SPANW; I believe we have info keys to specify things like what hosts to launch on, etc. >> Do you think MPI_COMM_SPAWN can help? > > It's the only method supported by the MPI standard. If you need it to block > until this new executable completes, you could use a barrier or other MPI > method to determine it. I believe that the user said they wanted to use the same cores as their original MPI job occupies for the new job -- they basically want the old job to block until the new job completes. Keep in mind that OMPI busy-polls waiting for progress, so you might actually get hosed here (two procs competing for time on the same core). I'm not immediately thinking of a good way to avoid this issue -- perhaps you could kludge something up such that the parent job polls on sleep() and checking to see if a message has arrived from the child (i.e., the last thing the child does before it calls MPI_FINALIZE is to send a message to its parents and then MPI_COMM_DISCONNECT from its parents). If the parent finds that it has a message from the child(ren), it can MPI_COMM_DISCONNECT and continue processing. Kinda hackey, but it might work...? -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/