On Fri, Apr 29, 2011 at 10:01 AM, Michael Di Domenico <mdidomeni...@gmail.com> wrote: > On Fri, Apr 29, 2011 at 4:52 AM, Ralph Castain <r...@open-mpi.org> wrote: >> Hi Michael >> >> Please see the attached updated patch to try for 1.5.3. I mistakenly free'd >> the envar after adding it to the environ :-/ > > The patch works great, i can now see the precondition environment > variable if i do > > mpirun -n 2 -host node1 <prog> > > and my <prog> runs just fine, However if i do > > srun --resv-ports -n 2 -w node1 <prog> > > I get > > [node1:16780] PSM EP connect error (unknown connect error): > [node1:16780] node1 > [node1:16780] PSM EP connect error (Endpoint could not be reached): > [node1:16780] node1 > > PML add procs failed > --> Returned "Error" (-1) instead of "Success" (0) > > I did notice a difference in the precondition env variable between the two > runs > > mpirun -n 2 -host node1 <prog> > > sets precondition_transports=fbc383997ee1b668-00d40f1401d2e827 (which > changes with each run (aka random)) > > srun --resv-ports -n 2 -w node1 <prog>
this should have been "srun --resv-ports -n 1 -w node1 <prog>", i can't run a 2 rank job, i get the PML error above > > sets precondition_transports=0000184500000000-0000000100000000 (which > doesn't seem to change run to run) >