Hi,

I just found out that I get no segmentation fault or bus error if I
add "-display-devel-map" to the commands.

rs0 fd1026 110 mpiexec -report-bindings -np 3 -bind-to hwthread 
-display-devel-map date

 Mapper requested: NULL  Last mapper: round_robin  Mapping policy: BYSLOT  
Ranking policy: SLOT  Binding policy: 
HWTHREAD[HWTHREAD]  Cpu set: NULL  PPR: NULL
        Num new daemons: 0      New daemon starting vpid INVALID
        Num nodes: 1

 Data for node: rs0.informatik.hs-fulda.de              Launch id: -1   State: 2
        Daemon: [[10411,0],0]   Daemon launched: True
        Num slots: 1    Slots in use: 1 Oversubscribed: TRUE
        Num slots allocated: 1  Max slots: 0
        Username on node: NULL
        Num procs: 3    Next node_rank: 3
        Data for proc: [[10411,1],0]
                Pid: 0  Local rank: 0   Node rank: 0    App rank: 0
                State: INITIALIZED      Restarts: 0     App_context: 0  Locale: 
0-15    Binding: 0[0]
        Data for proc: [[10411,1],1]
                Pid: 0  Local rank: 1   Node rank: 1    App rank: 1
                State: INITIALIZED      Restarts: 0     App_context: 0  Locale: 
0-15    Binding: 2[2]
        Data for proc: [[10411,1],2]
                Pid: 0  Local rank: 2   Node rank: 2    App rank: 2
                State: INITIALIZED      Restarts: 0     App_context: 0  Locale: 
0-15    Binding: 4[4]
[rs0.informatik.hs-fulda.de:20492] MCW rank 0 bound to : 
[B./../../..][../../../..]
[rs0.informatik.hs-fulda.de:20492] MCW rank 1 bound to : 
[../B./../..][../../../..]
[rs0.informatik.hs-fulda.de:20492] MCW rank 2 bound to : 
[../../B./..][../../../..]
Mon Sep 17 14:20:50 CEST 2012
Mon Sep 17 14:20:50 CEST 2012
Mon Sep 17 14:20:50 CEST 2012



rs0 fd1026 111 mpiexec -report-bindings -np 2 -bynode -bind-to hwthread 
-display-devel-map date

 Mapper requested: NULL  Last mapper: round_robin  Mapping policy: BYNODE  
Ranking policy: NODE  Binding policy: 
HWTHREAD[HWTHREAD]  Cpu set: NULL  PPR: NULL
        Num new daemons: 0      New daemon starting vpid INVALID
        Num nodes: 1

 Data for node: rs0.informatik.hs-fulda.de              Launch id: -1   State: 2
        Daemon: [[10417,0],0]   Daemon launched: True
        Num slots: 1    Slots in use: 1 Oversubscribed: TRUE
        Num slots allocated: 1  Max slots: 0
        Username on node: NULL
        Num procs: 2    Next node_rank: 2
        Data for proc: [[10417,1],0]
                Pid: 0  Local rank: 0   Node rank: 0    App rank: 0
                State: INITIALIZED      Restarts: 0     App_context: 0  Locale: 
0-15    Binding: 0[0]
        Data for proc: [[10417,1],1]
                Pid: 0  Local rank: 1   Node rank: 1    App rank: 1
                State: INITIALIZED      Restarts: 0     App_context: 0  Locale: 
0-15    Binding: 2[2]
[rs0.informatik.hs-fulda.de:20502] MCW rank 0 bound to : 
[B./../../..][../../../..]
[rs0.informatik.hs-fulda.de:20502] MCW rank 1 bound to : 
[../B./../..][../../../..]
Mon Sep 17 14:22:10 CEST 2012
Mon Sep 17 14:22:10 CEST 2012


Any ideas why an additional option "solves" the problem?


Kind regards

Siegmar



> I have installed openmpi-1.9a1r27342 on Solaris 10 with Oracle
> Solaris Studio compiler 12.3.
> 
> rs0 fd1026 106 mpicc -showme
> cc -I/usr/local/openmpi-1.9_64_cc/include -mt -m64 \
>    -L/usr/local/openmpi-1.9_64_cc/lib64 -lmpi -lpicl -lm -lkstat \
>    -llgrp -lsocket -lnsl -lrt -lm
> 
> I can run the following command.
> 
> rs0 fd1026 107 mpiexec -report-bindings -np 2 -bind-to hwthread date
> [rs0.informatik.hs-fulda.de:19704] MCW rank 0 bound to :
>   [B./../../..][../../../..]
> [rs0.informatik.hs-fulda.de:19704] MCW rank 1 bound to :
>   [../B./../..][../../../..]
> Mon Sep 17 13:07:34 CEST 2012
> Mon Sep 17 13:07:34 CEST 2012
> 
> I get a segmention fault if I increase the number of processes to 3.
> 
> rs0 fd1026 108 mpiexec -report-bindings -np 3 -bind-to hwthread date
> --------------------------------------------------------------------------
> mpiexec noticed that process rank 0 with PID 19711 on node
>   rs0.informatik.hs-fulda.de exited on signal 11 (Segmentation Fault).
> --------------------------------------------------------------------------
> [rs0:19713] *** Process received signal ***
> [rs0:19713] Signal: Segmentation Fault (11)
> [rs0:19713] Signal code: Invalid permissions (2)
> [rs0:19713] Failing at address: 1000002e8
> /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:0x282640
> /lib/sparcv9/libc.so.1:0xd8684
> /lib/sparcv9/libc.so.1:0xcc1f8
> /lib/sparcv9/libc.so.1:0xcc404
> /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:0x2c1488 [ Signal 11 
> (SEGV)]
> /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:opal_hwloc_base_cset2str+0x28
> /usr/local/openmpi-1.9_64_cc/lib64/openmpi/mca_odls_default.so:0xab00
> /usr/local/openmpi-1.9_64_cc/lib64/openmpi/mca_odls_default.so:0xb7e4
> /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:orte_odls_base_default_launch_local+0xa20
> /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:0x2997f4
> /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:0x299a20
> /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:opal_libevent2019_event_base_loop+0x1e8
> /usr/local/openmpi-1.9_64_cc/bin/orterun:orterun+0x1920
> /usr/local/openmpi-1.9_64_cc/bin/orterun:main+0x24
> /usr/local/openmpi-1.9_64_cc/bin/orterun:_start+0x12c
> [rs0:19713] *** End of error message ***
> ...
> (same output for the other two processes)
> 
> 
> If I add "-bynode" I get a bus error.
> 
> rs0 fd1026 110 mpiexec -report-bindings -np 2 -bynode -bind-to hwthread date
> --------------------------------------------------------------------------
> mpiexec noticed that process rank 0 with PID 19724 on node
>   rs0.informatik.hs-fulda.de exited on signal 10 (Bus Error).
> --------------------------------------------------------------------------
> [rs0:19724] *** Process received signal ***
> [rs0:19724] Signal: Bus Error (10)
> [rs0:19724] Signal code: Invalid address alignment (1)
> [rs0:19724] Failing at address: 1
> /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:0x282640
> /lib/sparcv9/libc.so.1:0xd8684
> /lib/sparcv9/libc.so.1:0xcc1f8
> /lib/sparcv9/libc.so.1:0xcc404
> /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:0x2c147c [ Signal 10 
> (BUS)]
> /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:opal_hwloc_base_cset2str+0x28
> /usr/local/openmpi-1.9_64_cc/lib64/openmpi/mca_odls_default.so:0xab00
> /usr/local/openmpi-1.9_64_cc/lib64/openmpi/mca_odls_default.so:0xb7e4
> /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:orte_odls_base_default_launch_local+0xa20
> /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:0x2997f4
> /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:0x299a20
> /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:opal_libevent2019_event_base_loop+0x1e8
> /usr/local/openmpi-1.9_64_cc/bin/orterun:orterun+0x1920
> /usr/local/openmpi-1.9_64_cc/bin/orterun:main+0x24
> /usr/local/openmpi-1.9_64_cc/bin/orterun:_start+0x12c
> [rs0:19724] *** End of error message ***
> ... 
> (same output for the other two processes)
> 
> 
> I get a segmentation fault for the following commands.
> 
> mpiexec -report-bindings -np 2 -map-by slot -bind-to hwthread date
> mpiexec -report-bindings -np 2 -map-by numa -bind-to hwthread date
> mpiexec -report-bindings -np 2 -map-by node -bind-to hwthread date
> 
> 
> I get a bus error for the following command.
> 
> mpiexec -report-bindings -np 2 -map-by socket -bind-to hwthread date
> 
> 
> The following commands work.
> 
> rs0 fd1026 120 mpiexec -report-bindings -np 2 -map-by hwthread -bind-to 
> hwthread date
> [rs0.informatik.hs-fulda.de:19788] MCW rank 0 bound to : 
> [B./../../..][../../../..]
> [rs0.informatik.hs-fulda.de:19788] MCW rank 1 bound to : 
> [.B/../../..][../../../..]
> Mon Sep 17 13:20:30 CEST 2012
> Mon Sep 17 13:20:30 CEST 2012
> 
> rs0 fd1026 121 mpiexec -report-bindings -np 2 -map-by core -bind-to hwthread 
> date
> [rs0.informatik.hs-fulda.de:19793] MCW rank 0 bound to : 
> [B./../../..][../../../..]
> [rs0.informatik.hs-fulda.de:19793] MCW rank 1 bound to : 
> [../B./../..][../../../..]
> Mon Sep 17 13:21:06 CEST 2012
> Mon Sep 17 13:21:06 CEST 2012
> 
> 
> I think that the following output is correct because I have a Sun M4000
> server with two quad-core processors each supporting two hardware-threads.
> 
> rs0 fd1026 124 mpiexec -report-bindings -np 2 -map-by board -bind-to hwthread 
> date
> --------------------------------------------------------------------------
> The specified mapping policy is not recognized:
> 
>   Policy: BYBOARD
> 
> Please check for a typo or ensure that the option is a supported
> one.
> --------------------------------------------------------------------------
> 
> 
> In my opinion I should be able to start and bind up to 16 processes
> if a map and bind to hwthreads or not? Thank you very much for any
> help in advance.
> 
> 
> Kind regards
> 
> Siegmar
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to