Ah, yes - that is going to be a problem. The PSM key gets generated by mpirun as it is shared info - i.e., every proc has to get the same value.
I can create a patch that will do this for the srun direct-launch scenario, if you want to try it. Would be later today, though. On Dec 30, 2010, at 10:31 AM, Michael Di Domenico wrote: > Well maybe not horray, yet. I might have jumped the gun a bit, it's > looking like srun works in general, but perhaps not with PSM > > With PSM i get this error, (at least now i know what i changed) > > Error obtaining unique transport key from ORTE > (orte_precondition_transports not present in the environment) > PML add procs failed > --> Returned "Error" (-1) instead of "Success" (0) > > Turn off PSM and srun works fine > > > On Thu, Dec 30, 2010 at 5:13 PM, Ralph Castain <r...@open-mpi.org> wrote: >> Hooray! >> >> On Dec 30, 2010, at 9:57 AM, Michael Di Domenico wrote: >> >>> I think i take it all back. I just tried it again and it seems to >>> work now. I'm not sure what I changed (between my first and this >>> msg), but it does appear to work now. >>> >>> On Thu, Dec 30, 2010 at 4:31 PM, Michael Di Domenico >>> <mdidomeni...@gmail.com> wrote: >>>> Yes that's true, error messages help. I was hoping there was some >>>> documentation to see what i've done wrong. I can't easily cut and >>>> paste errors from my cluster. >>>> >>>> Here's a snippet (hand typed) of the error message, but it does look >>>> like a rank communications error >>>> >>>> ORTE_ERROR_LOG: A message is attempting to be sent to a process whose >>>> contact information is unknown in file rml_oob_send.c at line 145. >>>> *** MPI_INIT failure message (snipped) *** >>>> orte_grpcomm_modex failed >>>> --> Returned "A messages is attempting to be sent to a process whose >>>> contact information us uknown" (-117) instead of "Success" (0) >>>> >>>> This msg repeats for each rank, an ultimately hangs the srun which i >>>> have to Ctrl-C and terminate >>>> >>>> I have mpiports defined in my slurm config and running srun with >>>> -resv-ports does show the SLURM_RESV_PORTS environment variable >>>> getting parts to the shell >>>> >>>> >>>> On Thu, Dec 23, 2010 at 8:09 PM, Ralph Castain <r...@open-mpi.org> wrote: >>>>> I'm not sure there is any documentation yet - not much clamor for it. :-/ >>>>> >>>>> It would really help if you included the error message. Otherwise, all I >>>>> can do is guess, which wastes both of our time :-( >>>>> >>>>> My best guess is that the port reservation didn't get passed down to the >>>>> MPI procs properly - but that's just a guess. >>>>> >>>>> >>>>> On Dec 23, 2010, at 12:46 PM, Michael Di Domenico wrote: >>>>> >>>>>> Can anyone point me towards the most recent documentation for using >>>>>> srun and openmpi? >>>>>> >>>>>> I followed what i found on the web with enabling the MpiPorts config >>>>>> in slurm and using the --resv-ports switch, but I'm getting an error >>>>>> from openmpi during setup. >>>>>> >>>>>> I'm using Slurm 2.1.15 and Openmpi 1.5 w/PSM >>>>>> >>>>>> I'm sure I'm missing a step. >>>>>> >>>>>> Thanks >>>>>> _______________________________________________ >>>>>> users mailing list >>>>>> us...@open-mpi.org >>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>> >>>>> >>>>> _______________________________________________ >>>>> users mailing list >>>>> us...@open-mpi.org >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>> >>>> >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users