Yes that's true, error messages help. I was hoping there was some documentation to see what i've done wrong. I can't easily cut and paste errors from my cluster.
Here's a snippet (hand typed) of the error message, but it does look like a rank communications error ORTE_ERROR_LOG: A message is attempting to be sent to a process whose contact information is unknown in file rml_oob_send.c at line 145. *** MPI_INIT failure message (snipped) *** orte_grpcomm_modex failed --> Returned "A messages is attempting to be sent to a process whose contact information us uknown" (-117) instead of "Success" (0) This msg repeats for each rank, an ultimately hangs the srun which i have to Ctrl-C and terminate I have mpiports defined in my slurm config and running srun with -resv-ports does show the SLURM_RESV_PORTS environment variable getting parts to the shell On Thu, Dec 23, 2010 at 8:09 PM, Ralph Castain <r...@open-mpi.org> wrote: > I'm not sure there is any documentation yet - not much clamor for it. :-/ > > It would really help if you included the error message. Otherwise, all I can > do is guess, which wastes both of our time :-( > > My best guess is that the port reservation didn't get passed down to the MPI > procs properly - but that's just a guess. > > > On Dec 23, 2010, at 12:46 PM, Michael Di Domenico wrote: > >> Can anyone point me towards the most recent documentation for using >> srun and openmpi? >> >> I followed what i found on the web with enabling the MpiPorts config >> in slurm and using the --resv-ports switch, but I'm getting an error >> from openmpi during setup. >> >> I'm using Slurm 2.1.15 and Openmpi 1.5 w/PSM >> >> I'm sure I'm missing a step. >> >> Thanks >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >