On Fri, Apr 29, 2011 at 10:01 AM, Michael Di Domenico
<mdidomeni...@gmail.com> wrote:
> On Fri, Apr 29, 2011 at 4:52 AM, Ralph Castain <r...@open-mpi.org> wrote:
>> Hi Michael
>>
>> Please see the attached updated patch to try for 1.5.3. I mistakenly free'd 
>> the envar after adding it to the environ :-/
>
> The patch works great, i can now see the precondition environment
> variable if i do
>
> mpirun -n 2 -host node1 <prog>
>
> and my <prog> runs just fine, However if i do
>
> srun --resv-ports -n 2 -w node1 <prog>
>
> I get
>
> [node1:16780] PSM EP connect error (unknown connect error):
> [node1:16780]  node1
> [node1:16780] PSM EP connect error (Endpoint could not be reached):
> [node1:16780]  node1
>
> PML add procs failed
> --> Returned "Error" (-1) instead of "Success" (0)
>
> I did notice a difference in the precondition env variable between the two 
> runs
>
> mpirun -n 2 -host node1 <prog>
>
> sets precondition_transports=fbc383997ee1b668-00d40f1401d2e827 (which
> changes with each run (aka random))
>
> srun --resv-ports -n 2 -w node1 <prog>

this should have been "srun --resv-ports -n 1 -w node1 <prog>", i
can't run a 2 rank job, i get the PML error above

>
> sets precondition_transports=0000184500000000-0000000100000000 (which
> doesn't seem to change run to run)
>

Reply via email to