Hi I applied your patch "rmaps.diff" to openmpi-1.6.4rc3r27923 and it works for my previous rankfile.
> #3493: Handle the case where rankfile provides the allocation > -----------------------------------+---------------------------- > Reporter: rhc | Owner: jsquyres > Type: changeset move request | Status: new > Priority: critical | Milestone: Open MPI 1.6.4 > Version: trunk | Keywords: > -----------------------------------+---------------------------- > Please apply the attached patch that corrects the rmaps function for > obtaining the available nodes when rankfile is providing the allocation. tyr rankfiles 129 more rf_linpc1 # mpiexec -report-bindings -rf rf_linpc1 hostname rank 0=linpc1 slot=0:0-1,1:0-1 tyr rankfiles 130 mpiexec -report-bindings -rf rf_linpc1 hostname [linpc1:31603] MCW rank 0 bound to socket 0[core 0-1] socket 1[core 0-1]: [B B][B B] (slot list 0:0-1,1:0-1) Unfortunately I don't get the expected result for the following rankfile. tyr rankfiles 114 more rf_bsp # mpiexec -report-bindings -rf rf_bsp hostname rank 0=linpc1 slot=0:0-1,1:0-1 rank 1=sunpc1 slot=0:0-1 rank 2=sunpc1 slot=1:0 rank 3=sunpc1 slot=1:1 I would expect that rank 0 gets all four cores from linpc1, rank 1 both cores of socket 0 from sunpc1, rank 2 core 0 of socket 1, and rank 3 core 1 of socket 1 from sunpc1. Everything is fine for my processes with rank 0 and 1, but it's wrong for ranks 2 and 3, because they both get all four cores of sunpc1. Is something wrong with my rankfile or with your mapping of processes to cores? I have removed the output from "hostname" and wrapped long lines. tyr rankfiles 115 mpiexec -report-bindings -rf rf_bsp hostname [linpc1:31092] MCW rank 0 bound to socket 0[core 0-1] socket 1[core 0-1]: [B B][B B] (slot list 0:0-1,1:0-1) [sunpc1:12916] MCW rank 1 bound to socket 0[core 0-1]: [B B][. .] (slot list 0:0-1) [sunpc1:12916] MCW rank 2 bound to socket 0[core 0-1] socket 1[core 0-1]: [B B][B B] (slot list 1:0) [sunpc1:12916] MCW rank 3 bound to socket 0[core 0-1] socket 1[core 0-1]: [B B][B B] (slot list 1:1) I get the following output, if I add the options which you mentioned in a previous email. tyr rankfiles 124 mpiexec -report-bindings -rf rf_bsp \ -display-allocation -mca ras_base_verbose 5 hostname [tyr.informatik.hs-fulda.de:19401] mca:base:select:( ras) Querying component [cm] [tyr.informatik.hs-fulda.de:19401] mca:base:select:( ras) Skipping component [cm]. Query failed to return a module [tyr.informatik.hs-fulda.de:19401] mca:base:select:( ras) No component selected! [tyr.informatik.hs-fulda.de:19401] [[27101,0],0] ras:base:allocate [tyr.informatik.hs-fulda.de:19401] [[27101,0],0] ras:base:allocate nothing found in module - proceeding to hostfile [tyr.informatik.hs-fulda.de:19401] [[27101,0],0] ras:base:allocate parsing default hostfile /usr/local/openmpi-1.6.4_64_cc/etc/openmpi-default-hostfile [tyr.informatik.hs-fulda.de:19401] [[27101,0],0] ras:base:allocate nothing found in hostfiles or dash-host - checking for rankfile [tyr.informatik.hs-fulda.de:19401] [[27101,0],0] ras:base:node_insert inserting 2 nodes [tyr.informatik.hs-fulda.de:19401] [[27101,0],0] ras:base:node_insert node linpc1 [tyr.informatik.hs-fulda.de:19401] [[27101,0],0] ras:base:node_insert node sunpc1 ====================== ALLOCATED NODES ====================== Data for node: tyr.informatik.hs-fulda.de Num slots: 0 Max slots: 0 Data for node: linpc1 Num slots: 1 Max slots: 0 Data for node: sunpc1 Num slots: 3 Max slots: 0 ================================================================= [linpc1:31532] MCW rank 0 bound to socket 0[core 0-1] socket 1[core 0-1]: [B B][B B] (slot list 0:0-1,1:0-1) [sunpc1:13136] MCW rank 1 bound to socket 0[core 0-1]: [B B][. .] (slot list 0:0-1) [sunpc1:13136] MCW rank 2 bound to socket 0[core 0-1] socket 1[core 0-1]: [B B][B B] (slot list 1:0) [sunpc1:13136] MCW rank 3 bound to socket 0[core 0-1] socket 1[core 0-1]: [B B][B B] (slot list 1:1) Thank you very much for any suggestions and any help in advance. Kind regards Siegmar