On Feb 6, 2013, at 2:59 PM, Eugene Loh <eugene....@oracle.com> wrote:

> On 02/06/13 04:29, Siegmar Gross wrote:
>> thank you very much for your answer. I have compiled your program
>> and get different behaviours for openmpi-1.6.4rc3 and openmpi-1.9.
>> I get the following output for openmpi-1.9 (different outputs !!!).
>> sunpc1 rankfiles 104 mpirun --report-bindings --rankfile myrankfile ./a.out
>> [sunpc1:26554] MCW rank 0 bound to socket 0[core 0[hwt 0]],   socket 0[core 
>> 1[hwt 0]]: [B/B][./.]
>> unbound
>> sunpc1 rankfiles 105 mpirun --report-bindings --rankfile myrankfile_0 ./a.out
>> [sunpc1:26557] MCW rank 0 bound to socket 0[core 0[hwt 0]]:   [B/.][./.]
>> bind to 0
> 
> I think what's happening is that although you specified "0:0" or "0:1" in the 
> rankfile, the string "0,0" or "0,1" is getting passed in (at least in the 
> runs I looked at).  That colon became a comma.  So, it's just by accident 
> that myrankfile_0 is working out all right.
> 
> Could someone who knows the code better than I do help me narrow this down?  
> E.g., where is the rankfile parsed?  For what it's worth, by the time mpirun 
> reaches orte_odls_base_default_get_add_procs_data(), orte_job_data already 
> contains the corrupted cpu_bitmap string.

You'll want to look at orte/mca/rmaps/rank_file/rmaps_rank_file.c - the bit map 
is now computed in mpirun and then sent to the daemons

> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


Reply via email to