Hi, I just found out that I get no segmentation fault or bus error if I add "-display-devel-map" to the commands.
rs0 fd1026 110 mpiexec -report-bindings -np 3 -bind-to hwthread -display-devel-map date Mapper requested: NULL Last mapper: round_robin Mapping policy: BYSLOT Ranking policy: SLOT Binding policy: HWTHREAD[HWTHREAD] Cpu set: NULL PPR: NULL Num new daemons: 0 New daemon starting vpid INVALID Num nodes: 1 Data for node: rs0.informatik.hs-fulda.de Launch id: -1 State: 2 Daemon: [[10411,0],0] Daemon launched: True Num slots: 1 Slots in use: 1 Oversubscribed: TRUE Num slots allocated: 1 Max slots: 0 Username on node: NULL Num procs: 3 Next node_rank: 3 Data for proc: [[10411,1],0] Pid: 0 Local rank: 0 Node rank: 0 App rank: 0 State: INITIALIZED Restarts: 0 App_context: 0 Locale: 0-15 Binding: 0[0] Data for proc: [[10411,1],1] Pid: 0 Local rank: 1 Node rank: 1 App rank: 1 State: INITIALIZED Restarts: 0 App_context: 0 Locale: 0-15 Binding: 2[2] Data for proc: [[10411,1],2] Pid: 0 Local rank: 2 Node rank: 2 App rank: 2 State: INITIALIZED Restarts: 0 App_context: 0 Locale: 0-15 Binding: 4[4] [rs0.informatik.hs-fulda.de:20492] MCW rank 0 bound to : [B./../../..][../../../..] [rs0.informatik.hs-fulda.de:20492] MCW rank 1 bound to : [../B./../..][../../../..] [rs0.informatik.hs-fulda.de:20492] MCW rank 2 bound to : [../../B./..][../../../..] Mon Sep 17 14:20:50 CEST 2012 Mon Sep 17 14:20:50 CEST 2012 Mon Sep 17 14:20:50 CEST 2012 rs0 fd1026 111 mpiexec -report-bindings -np 2 -bynode -bind-to hwthread -display-devel-map date Mapper requested: NULL Last mapper: round_robin Mapping policy: BYNODE Ranking policy: NODE Binding policy: HWTHREAD[HWTHREAD] Cpu set: NULL PPR: NULL Num new daemons: 0 New daemon starting vpid INVALID Num nodes: 1 Data for node: rs0.informatik.hs-fulda.de Launch id: -1 State: 2 Daemon: [[10417,0],0] Daemon launched: True Num slots: 1 Slots in use: 1 Oversubscribed: TRUE Num slots allocated: 1 Max slots: 0 Username on node: NULL Num procs: 2 Next node_rank: 2 Data for proc: [[10417,1],0] Pid: 0 Local rank: 0 Node rank: 0 App rank: 0 State: INITIALIZED Restarts: 0 App_context: 0 Locale: 0-15 Binding: 0[0] Data for proc: [[10417,1],1] Pid: 0 Local rank: 1 Node rank: 1 App rank: 1 State: INITIALIZED Restarts: 0 App_context: 0 Locale: 0-15 Binding: 2[2] [rs0.informatik.hs-fulda.de:20502] MCW rank 0 bound to : [B./../../..][../../../..] [rs0.informatik.hs-fulda.de:20502] MCW rank 1 bound to : [../B./../..][../../../..] Mon Sep 17 14:22:10 CEST 2012 Mon Sep 17 14:22:10 CEST 2012 Any ideas why an additional option "solves" the problem? Kind regards Siegmar > I have installed openmpi-1.9a1r27342 on Solaris 10 with Oracle > Solaris Studio compiler 12.3. > > rs0 fd1026 106 mpicc -showme > cc -I/usr/local/openmpi-1.9_64_cc/include -mt -m64 \ > -L/usr/local/openmpi-1.9_64_cc/lib64 -lmpi -lpicl -lm -lkstat \ > -llgrp -lsocket -lnsl -lrt -lm > > I can run the following command. > > rs0 fd1026 107 mpiexec -report-bindings -np 2 -bind-to hwthread date > [rs0.informatik.hs-fulda.de:19704] MCW rank 0 bound to : > [B./../../..][../../../..] > [rs0.informatik.hs-fulda.de:19704] MCW rank 1 bound to : > [../B./../..][../../../..] > Mon Sep 17 13:07:34 CEST 2012 > Mon Sep 17 13:07:34 CEST 2012 > > I get a segmention fault if I increase the number of processes to 3. > > rs0 fd1026 108 mpiexec -report-bindings -np 3 -bind-to hwthread date > -------------------------------------------------------------------------- > mpiexec noticed that process rank 0 with PID 19711 on node > rs0.informatik.hs-fulda.de exited on signal 11 (Segmentation Fault). > -------------------------------------------------------------------------- > [rs0:19713] *** Process received signal *** > [rs0:19713] Signal: Segmentation Fault (11) > [rs0:19713] Signal code: Invalid permissions (2) > [rs0:19713] Failing at address: 1000002e8 > /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:0x282640 > /lib/sparcv9/libc.so.1:0xd8684 > /lib/sparcv9/libc.so.1:0xcc1f8 > /lib/sparcv9/libc.so.1:0xcc404 > /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:0x2c1488 [ Signal 11 > (SEGV)] > /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:opal_hwloc_base_cset2str+0x28 > /usr/local/openmpi-1.9_64_cc/lib64/openmpi/mca_odls_default.so:0xab00 > /usr/local/openmpi-1.9_64_cc/lib64/openmpi/mca_odls_default.so:0xb7e4 > /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:orte_odls_base_default_launch_local+0xa20 > /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:0x2997f4 > /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:0x299a20 > /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:opal_libevent2019_event_base_loop+0x1e8 > /usr/local/openmpi-1.9_64_cc/bin/orterun:orterun+0x1920 > /usr/local/openmpi-1.9_64_cc/bin/orterun:main+0x24 > /usr/local/openmpi-1.9_64_cc/bin/orterun:_start+0x12c > [rs0:19713] *** End of error message *** > ... > (same output for the other two processes) > > > If I add "-bynode" I get a bus error. > > rs0 fd1026 110 mpiexec -report-bindings -np 2 -bynode -bind-to hwthread date > -------------------------------------------------------------------------- > mpiexec noticed that process rank 0 with PID 19724 on node > rs0.informatik.hs-fulda.de exited on signal 10 (Bus Error). > -------------------------------------------------------------------------- > [rs0:19724] *** Process received signal *** > [rs0:19724] Signal: Bus Error (10) > [rs0:19724] Signal code: Invalid address alignment (1) > [rs0:19724] Failing at address: 1 > /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:0x282640 > /lib/sparcv9/libc.so.1:0xd8684 > /lib/sparcv9/libc.so.1:0xcc1f8 > /lib/sparcv9/libc.so.1:0xcc404 > /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:0x2c147c [ Signal 10 > (BUS)] > /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:opal_hwloc_base_cset2str+0x28 > /usr/local/openmpi-1.9_64_cc/lib64/openmpi/mca_odls_default.so:0xab00 > /usr/local/openmpi-1.9_64_cc/lib64/openmpi/mca_odls_default.so:0xb7e4 > /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:orte_odls_base_default_launch_local+0xa20 > /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:0x2997f4 > /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:0x299a20 > /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:opal_libevent2019_event_base_loop+0x1e8 > /usr/local/openmpi-1.9_64_cc/bin/orterun:orterun+0x1920 > /usr/local/openmpi-1.9_64_cc/bin/orterun:main+0x24 > /usr/local/openmpi-1.9_64_cc/bin/orterun:_start+0x12c > [rs0:19724] *** End of error message *** > ... > (same output for the other two processes) > > > I get a segmentation fault for the following commands. > > mpiexec -report-bindings -np 2 -map-by slot -bind-to hwthread date > mpiexec -report-bindings -np 2 -map-by numa -bind-to hwthread date > mpiexec -report-bindings -np 2 -map-by node -bind-to hwthread date > > > I get a bus error for the following command. > > mpiexec -report-bindings -np 2 -map-by socket -bind-to hwthread date > > > The following commands work. > > rs0 fd1026 120 mpiexec -report-bindings -np 2 -map-by hwthread -bind-to > hwthread date > [rs0.informatik.hs-fulda.de:19788] MCW rank 0 bound to : > [B./../../..][../../../..] > [rs0.informatik.hs-fulda.de:19788] MCW rank 1 bound to : > [.B/../../..][../../../..] > Mon Sep 17 13:20:30 CEST 2012 > Mon Sep 17 13:20:30 CEST 2012 > > rs0 fd1026 121 mpiexec -report-bindings -np 2 -map-by core -bind-to hwthread > date > [rs0.informatik.hs-fulda.de:19793] MCW rank 0 bound to : > [B./../../..][../../../..] > [rs0.informatik.hs-fulda.de:19793] MCW rank 1 bound to : > [../B./../..][../../../..] > Mon Sep 17 13:21:06 CEST 2012 > Mon Sep 17 13:21:06 CEST 2012 > > > I think that the following output is correct because I have a Sun M4000 > server with two quad-core processors each supporting two hardware-threads. > > rs0 fd1026 124 mpiexec -report-bindings -np 2 -map-by board -bind-to hwthread > date > -------------------------------------------------------------------------- > The specified mapping policy is not recognized: > > Policy: BYBOARD > > Please check for a typo or ensure that the option is a supported > one. > -------------------------------------------------------------------------- > > > In my opinion I should be able to start and bind up to 16 processes > if a map and bind to hwthreads or not? Thank you very much for any > help in advance. > > > Kind regards > > Siegmar > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users