Yes that's true, error messages help.  I was hoping there was some
documentation to see what i've done wrong.  I can't easily cut and
paste errors from my cluster.

Here's a snippet (hand typed) of the error message, but it does look
like a rank communications error

ORTE_ERROR_LOG: A message is attempting to be sent to a process whose
contact information is unknown in file rml_oob_send.c at line 145.
*** MPI_INIT failure message (snipped) ***
orte_grpcomm_modex failed
--> Returned "A messages is attempting to be sent to a process whose
contact information us uknown" (-117) instead of "Success" (0)

This msg repeats for each rank, an ultimately hangs the srun which i
have to Ctrl-C and terminate

I have mpiports defined in my slurm config and running srun with
-resv-ports does show the SLURM_RESV_PORTS environment variable
getting parts to the shell


On Thu, Dec 23, 2010 at 8:09 PM, Ralph Castain <r...@open-mpi.org> wrote:
> I'm not sure there is any documentation yet - not much clamor for it. :-/
>
> It would really help if you included the error message. Otherwise, all I can 
> do is guess, which wastes both of our time :-(
>
> My best guess is that the port reservation didn't get passed down to the MPI 
> procs properly - but that's just a guess.
>
>
> On Dec 23, 2010, at 12:46 PM, Michael Di Domenico wrote:
>
>> Can anyone point me towards the most recent documentation for using
>> srun and openmpi?
>>
>> I followed what i found on the web with enabling the MpiPorts config
>> in slurm and using the --resv-ports switch, but I'm getting an error
>> from openmpi during setup.
>>
>> I'm using Slurm 2.1.15 and Openmpi 1.5 w/PSM
>>
>> I'm sure I'm missing a step.
>>
>> Thanks
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

Reply via email to