Hmmm....I've never seen that error. The only way to get it is if that module is 
failing to properly setup the cmd line for launching the ORTE daemons.

Any particular reason to use something as old as 1.4.3? Could you upgrade to 
the 1.6 series?

On Oct 10, 2012, at 10:44 AM, USA Linux UAE <usasoftwareengin...@gmail.com> 
wrote:

> Hello
> 
> I am using openmpi (1.4.3) with slurm (2.4.2) on Centos 6.0
> 
> I can execute my jobs  with mpirun  to my nodelist in partition using  "-H" 
> option with mpirun.
> 
> But when i use slurm and use 
> 
> salloc -n 3 sh
> 
> and then submit mpi jobs using mpirun <mpibinary>
> 
> I get the following error:
> 
> salloc: Granted job allocation 289
> sh-4.1$ mpirun mpihello
> [v2:29784] [[57331,0],0] ORTE_ERROR_LOG: Not found in file plm_slurm_module.c 
> at line 350
> --------------------------------------------------------------------------
> A daemon (pid unknown) died unexpectedly on signal 1  while attempting to
> launch so we are aborting.
> 
> There may be more information reported by the environment (see above).
> 
> This may be because the daemon was unable to find all the needed shared
> libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
> location of the shared libraries on the remote nodes and this will
> automatically be forwarded to the remote nodes.
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> mpirun noticed that the job aborted, but has no info as to the process
> that caused that situation.
> --------------------------------------------------------------------------
> mpirun: clean termination accomplished
> 
> 
> Any debugging procedure  with openmpi and slurm?
> 
> Thanks
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


Reply via email to