Hi, I installed openmpi-1.9a1r27362 and my tests are more awful than with openmpi-1.9a1r27342. When I try the commands which I reported in my email from September 18th, I get a segmentation fault now.
The following commands worked in openmpi-1.9a1r27342, but I get segmentation faults with "Address not mapped" in openmpi-1.9a1r27362. mpiexec -report-bindings -np 4 -bynode -bind-to hwthread \ -display-devel-map date [rs0...:23490] MCW rank 1 bound to : [../B./../..][../../../..] [rs0...:23490] MCW rank 2 bound to : [../../B./..][../../../..] [rs0...:23490] MCW rank 3 bound to : [../../../B.][../../../..] [rs0...:23490] MCW rank 0 bound to : [B./../../..][../../../..] mpiexec -report-bindings -np 5 -map-by core -bind-to hwthread \ -display-devel-map date [rs0...:23619] MCW rank 3 bound to : [../../../B.][../../../..] [rs0...:23619] MCW rank 4 bound to : [../../../..][B./../../..] [rs0...:23619] MCW rank 0 bound to : [B./../../..][../../../..] [rs0...:23619] MCW rank 1 bound to : [../B./../..][../../../..] [rs0...:23619] MCW rank 2 bound to : [../../B./..][../../../..] mpiexec -report-bindings -np 4 -map-by hwthread -bind-to hwthread \ -display-devel-map date [rs0...:23676] MCW rank 1 bound to : [.B/../../..][../../../..] [rs0...:23676] MCW rank 2 bound to : [../B./../..][../../../..] [rs0...:23676] MCW rank 3 bound to : [../.B/../..][../../../..] [rs0...:23676] MCW rank 0 bound to : [B./../../..][../../../..] mpiexec -report-bindings -np 2 -bind-to hwthread date [rs0...:19704] MCW rank 0 bound to : [B./../../..][../../../..] [rs0...:19704] MCW rank 1 bound to : [../B./../..][../../../..] mpiexec -report-bindings -np 2 -map-by core -bind-to hwthread date [rs0...:19793] MCW rank 0 bound to : [B./../../..][../../../..] [rs0...:19793] MCW rank 1 bound to : [../B./../..][../../../..] mpiexec -report-bindings -np 2 -map-by hwthread -bind-to hwthread date [rs0...:19788] MCW rank 0 bound to : [B./../../..][../../../..] [rs0...:19788] MCW rank 1 bound to : [.B/../../..][../../../..] I still get a segmentation fault with "Address not mapped" for the following commands. mpiexec -report-bindings -np 2 -map-by slot -bind-to hwthread date mpiexec -report-bindings -np 2 -map-by numa -bind-to hwthread date mpiexec -report-bindings -np 2 -map-by node -bind-to hwthread date mpiexec -report-bindings -np 5 -bynode -bind-to hwthread \ -display-devel-map date mpiexec -report-bindings -np 1 -map-by core -bind-to hwthread date mpiexec -report-bindings -np 6 -map-by core -bind-to hwthread \ -display-devel-map date mpiexec -report-bindings -np 1 -map-by socket -bind-to hwthread date I don't get bus errors any longer for the following commands, but now I get segmentation faults with "Address not mapped". mpiexec -report-bindings -np 2 -bynode -bind-to hwthread date mpiexec -report-bindings -np 2 -map-by socket -bind-to hwthread date Now I get a bus error with the following commands. mpiexec -report-bindings -np 3 -bind-to hwthread date mpiexec -report-bindings -np 1 -map-by hwthread -bind-to hwthread date mpiexec -report-bindings -np 5 -map-by hwthread -bind-to hwthread \ -display-devel-map date The following command works. mpiexec -report-bindings -np 1 -bynode -bind-to hwthread date [rs0...:06795] MCW rank 0 bound to : [B./../../..][../../../..] Why do I get an output for "-bynode" and a bus error for "-map-by node"? I thought that is the same? rs0 topo 114 mpiexec -report-bindings -np 1 -bynode \ -bind-to hwthread date [rs0...:07108] MCW rank 0 bound to : [B./../../..][../../../..] Wed Sep 26 12:23:11 CEST 2012 rs0 topo 115 mpiexec -report-bindings -np 1 -map-by node \ -bind-to hwthread date [rs0:07113] *** Process received signal *** [rs0:07113] Signal: Bus Error (10) The output is sometimes once more different, when I add the option "-mca ess_base_verbose 5", e.g., in error_5a.txt everything is fine, and I get the above mentioned bus error in error_5b.txt . I have attached all files to keep the email readable. Hopefully somebody can find out what is wrong and fix the problem. mpiexec -report-bindings -np 4 -bynode -bind-to hwthread \ -display-devel-map -mca ess_base_verbose 5 date >& error_1.txt mpiexec -report-bindings -np 5 -map-by core -bind-to hwthread \ -display-devel-map -mca ess_base_verbose 5 date >& error_2.txt mpiexec -report-bindings -np 2 -map-by hwthread -bind-to hwthread \ -mca ess_base_verbose 5 date >& error_3.txt mpiexec -report-bindings -np 2 -map-by hwthread -bind-to hwthread \ -mca ess_base_verbose 5 date >& error_4.txt mpiexec -report-bindings -np 1 -map-by node \ -bind-to hwthread -mca ess_base_verbose 5 date >& error_5a.txt mpiexec -report-bindings -np 1 -map-by node \ -bind-to hwthread date >& error_5b.txt Thank you very much for any help in advance. Kind regards Siegmar > Please try and keep the User list on the messages - allows others > to chime in. > > You can see the topology by adding "-mca ess_base_verbose 5" to > your command line. You'll get other stuff as well, and you'll > need to --enable-debug in your configure. > > > On Sep 24, 2012, at 4:47 AM, Siegmar Gross > <siegmar.gr...@informatik.hs-fulda.de> wrote: > > > Hi, > > > >> The 1.7 series has a completely different way of handling node > >> topology than was used in the 1.6 series. It provides some > >> enhanced features, but it does have some drawbacks in the case > >> where the topology info isn't correct. I fear you are running > >> into this problem (again). > >> > >> All the commands you show here work fine for me on a Linux > >> x86_64 box using 1.7r27361 on a Westmere 6-core single-socket > >> machine with hyperthreads enabled. I cannot replicate any of > >> the reported problems, so there isn't much I can do at this point. > >> > >> As I've said before, the root problem here appears to be some > >> hwloc-related issue with your setup. Until that gets resolved > >> so we get correct topology info, I'm not sure what can be done > >> to resolve what you are seeing. I'll raise the question of > >> possibly providing some alternative support for setups like > >> yours that just can't get topology info, but that would > >> definitely be a long-term question. > > > > Can we check if you get wrong topology info or which info you get > > at all? Can you tell me a file and location where I can print the > > values of relevant variables on my architecture? Perhaps that can > > help to determine what goes wrong. I would use the latest trunk > > tarball and can make the test a day later, because all changes on > > our "installation server" are mirrored in the night to a our file > > server for all machines. > > > > > > Kind regards > > > > Siegmar > > > > > > > > > >> On Sep 23, 2012, at 3:20 AM, Siegmar Gross > > <siegmar.gr...@informatik.hs-fulda.de> wrote: > >> > >>> Hi, > >>> > >>> yesterday I installed openmpi-1.7a1r27358 and it has an improved > >>> error message compared to openmpi-1.6.2, but doesn't show process bindings > >>> and has some other problems as well. > >>> > >>> > >>> "sunpc0" and "linpc0" are equipped with two dual-core processors running > >>> Solaris 10 x86_64 and Linux x86_64 resp. "tyr" is a dual-processor machine > >>> running Solaris 10 Sparc. > >>> > >>> tyr fd1026 105 mpiexec -np 2 -host sunpc0 -report-bindings \ > >>> -map-by core -bind-to-core date > >>> Sun Sep 23 11:46:36 CEST 2012 > >>> Sun Sep 23 11:46:36 CEST 2012 > >>> > >>> tyr fd1026 106 mpicc -showme > >>> cc -I/usr/local/openmpi-1.7_64_cc/include -mt -m64 > >>> -L/usr/local/openmpi-1.7_64_cc/lib64 -lmpi -lpicl -lm -lkstat -llgrp > >>> -lsocket -lnsl -lrt -lm > >>> > >>> > >>> openmpi-1.6.2 shows process bindings. > >>> > >>> tyr fd1026 103 mpiexec -np 2 -host sunpc0 -report-bindings \ > >>> -bycore -bind-to-core date > >>> Sun Sep 23 12:09:06 CEST 2012 > >>> [sunpc0:13197] MCW rank 0 bound to socket 0[core 0]: [B .][. .] > >>> [sunpc0:13197] MCW rank 1 bound to socket 0[core 1]: [. B][. .] > >>> Sun Sep 23 12:09:06 CEST 2012 > >>> > >>> > >>> tyr fd1026 104 mpicc -showme > >>> cc -I/usr/local/openmpi-1.6.2_64_cc/include -mt -m64 > >>> -L/usr/local/openmpi-1.6.2_64_cc/lib64 -lmpi -lm -lkstat -llgrp > >>> -lsocket -lnsl -lrt -lm > >>> > >>> > >>> On my Linux machine I get a warning. > >>> > >>> tyr fd1026 113 mpiexec -np 2 -host linpc0 -report-bindings \ > >>> -map-by core -bind-to-core date > >>> -------------------------------------------------------------------------- > >>> WARNING: a request was made to bind a process. While the system > >>> supports binding the process itself, at least one node does NOT > >>> support binding memory to the process location. > >>> > >>> Node: linpc0 > >>> > >>> This is a warning only; your job will continue, though performance may > >>> be degraded. > >>> -------------------------------------------------------------------------- > >>> Sun Sep 23 11:56:04 CEST 2012 > >>> Sun Sep 23 11:56:04 CEST 2012 > >>> > >>> > >>> > >>> Everything works fine with openmpi-1.6.2. > >>> > >>> tyr fd1026 106 mpiexec -np 2 -host linpc0 -report-bindings \ > >>> -bycore -bind-to-core date > >>> [linpc0:15808] MCW rank 0 bound to socket 0[core 0]: [B .][. .] > >>> [linpc0:15808] MCW rank 1 bound to socket 0[core 1]: [. B][. .] > >>> Sun Sep 23 12:11:47 CEST 2012 > >>> Sun Sep 23 12:11:47 CEST 2012 > >>> > >>> > >>> > >>> > >>> Om my Solaris Sparc machine I get the following errors. > >>> > >>> > >>> tyr fd1026 121 mpiexec -np 2 -report-bindings -map-by core -bind-to-core > > date > >>> [tyr.informatik.hs-fulda.de:23773] [[32457,0],0] ORTE_ERROR_LOG: Value > >>> out > > of bounds in file > >>> ../../../../openmpi-1.7a1r27358/orte/mca/odls/base/odls_base_default_fns.c > >>> > > at line 847 > >>> [tyr.informatik.hs-fulda.de:23773] [[32457,0],0] ORTE_ERROR_LOG: Value > >>> out > > of bounds in file > >>> ../../../../openmpi-1.7a1r27358/orte/mca/odls/base/odls_base_default_fns.c > >>> > > at line 1414 > >>> [tyr.informatik.hs-fulda.de:23773] [[32457,0],0] ORTE_ERROR_LOG: Value > >>> out > > of bounds in file > >>> ../../../../openmpi-1.7a1r27358/orte/mca/odls/base/odls_base_default_fns.c > >>> > > at line 847 > >>> [tyr.informatik.hs-fulda.de:23773] [[32457,0],0] ORTE_ERROR_LOG: Value > >>> out > > of bounds in file > >>> ../../../../openmpi-1.7a1r27358/orte/mca/odls/base/odls_base_default_fns.c > >>> > > at line 1414 > >>> > >>> > >>> > >>> tyr fd1026 122 mpiexec -np 2 -host tyr -report-bindings -map-by core > > -bind-to core date > >>> -------------------------------------------------------------------------- > >>> All nodes which are allocated for this job are already filled. > >>> -------------------------------------------------------------------------- > >>> > >>> > >>> Once more everything works fine with openmpi-1.6.2. > >>> > >>> tyr fd1026 109 mpiexec -np 2 -report-bindings -bycore -bind-to-core date > >>> [tyr.informatik.hs-fulda.de:23869] MCW rank 0 bound to socket 0[core 0]: > > [B][.] > >>> [tyr.informatik.hs-fulda.de:23869] MCW rank 1 bound to socket 1[core 0]: > > [.][B] > >>> Sun Sep 23 12:14:09 CEST 2012 > >>> Sun Sep 23 12:14:09 CEST 2012 > >>> > >>> tyr fd1026 110 mpiexec -np 2 -host tyr -report-bindings -bycore > > -bind-to-core date > >>> [tyr.informatik.hs-fulda.de:23877] MCW rank 0 bound to socket 0[core 0]: > > [B][.] > >>> [tyr.informatik.hs-fulda.de:23877] MCW rank 1 bound to socket 1[core 0]: > > [.][B] > >>> Sun Sep 23 12:16:05 CEST 2012 > >>> Sun Sep 23 12:16:05 CEST 2012 > >>> > >>> > >>> Kind regards > >>> > >>> Siegmar > >>> > >>> _______________________________________________ > >>> users mailing list > >>> us...@open-mpi.org > >>> http://www.open-mpi.org/mailman/listinfo.cgi/users > >> > >> > > > >
[rs0.informatik.hs-fulda.de:07146] mca:base:select:( ess) Querying component [env] [rs0.informatik.hs-fulda.de:07146] mca:base:select:( ess) Skipping component [env]. Query failed to return a module [rs0.informatik.hs-fulda.de:07146] mca:base:select:( ess) Querying component [hnp] [rs0.informatik.hs-fulda.de:07146] mca:base:select:( ess) Query of component [hnp] set priority to 100 [rs0.informatik.hs-fulda.de:07146] mca:base:select:( ess) Querying component [singleton] [rs0.informatik.hs-fulda.de:07146] mca:base:select:( ess) Skipping component [singleton]. Query failed to return a module [rs0.informatik.hs-fulda.de:07146] mca:base:select:( ess) Querying component [tool] [rs0.informatik.hs-fulda.de:07146] mca:base:select:( ess) Skipping component [tool]. Query failed to return a module [rs0.informatik.hs-fulda.de:07146] mca:base:select:( ess) Selected component [hnp] [rs0.informatik.hs-fulda.de:07146] [[INVALID],INVALID] Topology Info: [rs0.informatik.hs-fulda.de:07146] Type: Machine Number of child objects: 1 Name=NULL total=33554432KB OSName=SunOS OSRelease=5.10 OSVersion=Generic_147440-21 Architecture=sun4u Cpuset: 0x0000ffff Online: 0x0000ffff Allowed: 0x0000ffff Bind CPU proc: TRUE Bind CPU thread: TRUE Bind MEM proc: TRUE Bind MEM thread: TRUE Type: NUMANode Number of child objects: 2 Name=NULL local=33554432KB total=33554432KB Cpuset: 0x0000ffff Online: 0x0000ffff Allowed: 0x0000ffff Type: Socket Number of child objects: 4 Name=NULL CPUType=sparcv9 CPUModel=SPARC64_VII Cpuset: 0x000000ff Online: 0x000000ff Allowed: 0x000000ff Type: Core Number of child objects: 2 Name=NULL Cpuset: 0x00000003 Online: 0x00000003 Allowed: 0x00000003 Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00000001 Online: 0x00000001 Allowed: 0x00000001 Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00000002 Online: 0x00000002 Allowed: 0x00000002 Type: Core Number of child objects: 2 Name=NULL Cpuset: 0x0000000c Online: 0x0000000c Allowed: 0x0000000c Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00000004 Online: 0x00000004 Allowed: 0x00000004 Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00000008 Online: 0x00000008 Allowed: 0x00000008 Type: Core Number of child objects: 2 Name=NULL Cpuset: 0x00000030 Online: 0x00000030 Allowed: 0x00000030 Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00000010 Online: 0x00000010 Allowed: 0x00000010 Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00000020 Online: 0x00000020 Allowed: 0x00000020 Type: Core Number of child objects: 2 Name=NULL Cpuset: 0x000000c0 Online: 0x000000c0 Allowed: 0x000000c0 Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00000040 Online: 0x00000040 Allowed: 0x00000040 Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00000080 Online: 0x00000080 Allowed: 0x00000080 Type: Socket Number of child objects: 4 Name=NULL CPUType=sparcv9 CPUModel=SPARC64_VII Cpuset: 0x0000ff00 Online: 0x0000ff00 Allowed: 0x0000ff00 Type: Core Number of child objects: 2 Name=NULL Cpuset: 0x00000300 Online: 0x00000300 Allowed: 0x00000300 Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00000100 Online: 0x00000100 Allowed: 0x00000100 Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00000200 Online: 0x00000200 Allowed: 0x00000200 Type: Core Number of child objects: 2 Name=NULL Cpuset: 0x00000c00 Online: 0x00000c00 Allowed: 0x00000c00 Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00000400 Online: 0x00000400 Allowed: 0x00000400 Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00000800 Online: 0x00000800 Allowed: 0x00000800 Type: Core Number of child objects: 2 Name=NULL Cpuset: 0x00003000 Online: 0x00003000 Allowed: 0x00003000 Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00001000 Online: 0x00001000 Allowed: 0x00001000 Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00002000 Online: 0x00002000 Allowed: 0x00002000 Type: Core Number of child objects: 2 Name=NULL Cpuset: 0x0000c000 Online: 0x0000c000 Allowed: 0x0000c000 Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00004000 Online: 0x00004000 Allowed: 0x00004000 Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00008000 Online: 0x00008000 Allowed: 0x00008000 Mapper requested: NULL Last mapper: round_robin Mapping policy: BYNODE Ranking policy: NODE Binding policy: HWTHREAD[HWTHREAD] Cpu set: NULL PPR: NULL Num new daemons: 0 New daemon starting vpid INVALID Num nodes: 1 Data for node: rs0.informatik.hs-fulda.de Launch id: -1 State: 2 Daemon: [[25421,0],0] Daemon launched: True Num slots: 1 Slots in use: 1 Oversubscribed: TRUE Num slots allocated: 1 Max slots: 0 Username on node: NULL Num procs: 4 Next node_rank: 4 Data for proc: [[25421,1],0] Pid: 0 Local rank: 0 Node rank: 0 App rank: 0 State: INITIALIZED Restarts: 0 App_context: 0 Locale: 0-15 Binding: 0[0] Data for proc: [[25421,1],1] Pid: 0 Local rank: 1 Node rank: 1 App rank: 1 State: INITIALIZED Restarts: 0 App_context: 0 Locale: 0-15 Binding: 2[2] Data for proc: [[25421,1],2] Pid: 0 Local rank: 2 Node rank: 2 App rank: 2 State: INITIALIZED Restarts: 0 App_context: 0 Locale: 0-15 Binding: 4[4] Data for proc: [[25421,1],3] Pid: 0 Local rank: 3 Node rank: 3 App rank: 3 State: INITIALIZED Restarts: 0 App_context: 0 Locale: 0-15 Binding: 6[6] -------------------------------------------------------------------------- mpiexec noticed that process rank 1 with PID 7150 on node rs0.informatik.hs-fulda.de exited on signal 11 (Segmentation Fault). -------------------------------------------------------------------------- Wed Sep 26 12:38:53 CEST 2012 Wed Sep 26 12:38:53 CEST 2012 [rs0:07154] *** Process received signal *** [rs0:07154] Signal: Bus Error (10) [rs0:07154] Signal code: Invalid address alignment (1) [rs0:07154] Failing at address: 620900000019 /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:opal_backtrace_print+0x14 /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:0x503e30 /lib/sparcv9/libc.so.1:0xd8684 /lib/sparcv9/libc.so.1:0xcc1f8 /lib/sparcv9/libc.so.1:0xcc404 /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:0x572eb0 [ Signal 2128894800 (?)] /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:opal_hwloc_base_cset2str+0x64 /usr/local/openmpi-1.9_64_cc/lib64/openmpi/mca_odls_default.so:0x126f8 /usr/local/openmpi-1.9_64_cc/lib64/openmpi/mca_odls_default.so:0x135f0 /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:orte_odls_base_default_launch_local+0x1e6c /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:0x53468c /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:0x5348b8 /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:opal_libevent2019_event_base_loop+0x1e8 /usr/local/openmpi-1.9_64_cc/bin/orterun:orterun+0x1ce4 /usr/local/openmpi-1.9_64_cc/bin/orterun:main+0x24 /usr/local/openmpi-1.9_64_cc/bin/orterun:_start+0x12c [rs0:07154] *** End of error message *** [rs0.informatik.hs-fulda.de:07146] MCW rank 0 bound to : [B./../../..][../../../..] [rs0:07150] *** Process received signal *** [rs0:07150] Signal: Segmentation Fault (11) [rs0:07150] Signal code: Address not mapped (1) [rs0:07150] Failing at address: 8 /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:opal_backtrace_print+0x14 /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:0x503e30 /lib/sparcv9/libc.so.1:0xd8684 /lib/sparcv9/libc.so.1:0xcc1f8 /lib/sparcv9/libc.so.1:0xcc404 /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:0x572eb0 [ Signal 2128894752 (?)] /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:opal_hwloc_base_cset2str+0x64 /usr/local/openmpi-1.9_64_cc/lib64/openmpi/mca_odls_default.so:0x126f8 /usr/local/openmpi-1.9_64_cc/lib64/openmpi/mca_odls_default.so:0x135f0 /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:orte_odls_base_default_launch_local+0x1e6c /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:0x53468c /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:0x5348b8 /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:opal_libevent2019_event_base_loop+0x1e8 /usr/local/openmpi-1.9_64_cc/bin/orterun:orterun+0x1ce4 /usr/local/openmpi-1.9_64_cc/bin/orterun:main+0x24 /usr/local/openmpi-1.9_64_cc/bin/orterun:_start+0x12c [rs0:07150] *** End of error message *** [rs0.informatik.hs-fulda.de:07146] MCW rank 2 bound to : [../../B./..][../../../..]
[rs0.informatik.hs-fulda.de:07155] mca:base:select:( ess) Querying component [env] [rs0.informatik.hs-fulda.de:07155] mca:base:select:( ess) Skipping component [env]. Query failed to return a module [rs0.informatik.hs-fulda.de:07155] mca:base:select:( ess) Querying component [hnp] [rs0.informatik.hs-fulda.de:07155] mca:base:select:( ess) Query of component [hnp] set priority to 100 [rs0.informatik.hs-fulda.de:07155] mca:base:select:( ess) Querying component [singleton] [rs0.informatik.hs-fulda.de:07155] mca:base:select:( ess) Skipping component [singleton]. Query failed to return a module [rs0.informatik.hs-fulda.de:07155] mca:base:select:( ess) Querying component [tool] [rs0.informatik.hs-fulda.de:07155] mca:base:select:( ess) Skipping component [tool]. Query failed to return a module [rs0.informatik.hs-fulda.de:07155] mca:base:select:( ess) Selected component [hnp] [rs0.informatik.hs-fulda.de:07155] [[INVALID],INVALID] Topology Info: [rs0.informatik.hs-fulda.de:07155] Type: Machine Number of child objects: 1 Name=NULL total=33554432KB OSName=SunOS OSRelease=5.10 OSVersion=Generic_147440-21 Architecture=sun4u Cpuset: 0x0000ffff Online: 0x0000ffff Allowed: 0x0000ffff Bind CPU proc: TRUE Bind CPU thread: TRUE Bind MEM proc: TRUE Bind MEM thread: TRUE Type: NUMANode Number of child objects: 2 Name=NULL local=33554432KB total=33554432KB Cpuset: 0x0000ffff Online: 0x0000ffff Allowed: 0x0000ffff Type: Socket Number of child objects: 4 Name=NULL CPUType=sparcv9 CPUModel=SPARC64_VII Cpuset: 0x000000ff Online: 0x000000ff Allowed: 0x000000ff Type: Core Number of child objects: 2 Name=NULL Cpuset: 0x00000003 Online: 0x00000003 Allowed: 0x00000003 Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00000001 Online: 0x00000001 Allowed: 0x00000001 Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00000002 Online: 0x00000002 Allowed: 0x00000002 Type: Core Number of child objects: 2 Name=NULL Cpuset: 0x0000000c Online: 0x0000000c Allowed: 0x0000000c Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00000004 Online: 0x00000004 Allowed: 0x00000004 Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00000008 Online: 0x00000008 Allowed: 0x00000008 Type: Core Number of child objects: 2 Name=NULL Cpuset: 0x00000030 Online: 0x00000030 Allowed: 0x00000030 Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00000010 Online: 0x00000010 Allowed: 0x00000010 Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00000020 Online: 0x00000020 Allowed: 0x00000020 Type: Core Number of child objects: 2 Name=NULL Cpuset: 0x000000c0 Online: 0x000000c0 Allowed: 0x000000c0 Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00000040 Online: 0x00000040 Allowed: 0x00000040 Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00000080 Online: 0x00000080 Allowed: 0x00000080 Type: Socket Number of child objects: 4 Name=NULL CPUType=sparcv9 CPUModel=SPARC64_VII Cpuset: 0x0000ff00 Online: 0x0000ff00 Allowed: 0x0000ff00 Type: Core Number of child objects: 2 Name=NULL Cpuset: 0x00000300 Online: 0x00000300 Allowed: 0x00000300 Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00000100 Online: 0x00000100 Allowed: 0x00000100 Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00000200 Online: 0x00000200 Allowed: 0x00000200 Type: Core Number of child objects: 2 Name=NULL Cpuset: 0x00000c00 Online: 0x00000c00 Allowed: 0x00000c00 Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00000400 Online: 0x00000400 Allowed: 0x00000400 Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00000800 Online: 0x00000800 Allowed: 0x00000800 Type: Core Number of child objects: 2 Name=NULL Cpuset: 0x00003000 Online: 0x00003000 Allowed: 0x00003000 Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00001000 Online: 0x00001000 Allowed: 0x00001000 Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00002000 Online: 0x00002000 Allowed: 0x00002000 Type: Core Number of child objects: 2 Name=NULL Cpuset: 0x0000c000 Online: 0x0000c000 Allowed: 0x0000c000 Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00004000 Online: 0x00004000 Allowed: 0x00004000 Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00008000 Online: 0x00008000 Allowed: 0x00008000 Mapper requested: NULL Last mapper: round_robin Mapping policy: BYCORE Ranking policy: SLOT Binding policy: HWTHREAD[HWTHREAD] Cpu set: NULL PPR: NULL Num new daemons: 0 New daemon starting vpid INVALID Num nodes: 1 Data for node: rs0.informatik.hs-fulda.de Launch id: -1 State: 2 Daemon: [[25428,0],0] Daemon launched: True Num slots: 1 Slots in use: 1 Oversubscribed: TRUE Num slots allocated: 1 Max slots: 0 Username on node: NULL Num procs: 5 Next node_rank: 5 Data for proc: [[25428,1],0] Pid: 0 Local rank: 0 Node rank: 0 App rank: 0 State: INITIALIZED Restarts: 0 App_context: 0 Locale: 0-1 Binding: 0[0] Data for proc: [[25428,1],1] Pid: 0 Local rank: 1 Node rank: 1 App rank: 1 State: INITIALIZED Restarts: 0 App_context: 0 Locale: 2-3 Binding: 2[2] Data for proc: [[25428,1],2] Pid: 0 Local rank: 2 Node rank: 2 App rank: 2 State: INITIALIZED Restarts: 0 App_context: 0 Locale: 4-5 Binding: 4[4] Data for proc: [[25428,1],3] Pid: 0 Local rank: 3 Node rank: 3 App rank: 3 State: INITIALIZED Restarts: 0 App_context: 0 Locale: 6-7 Binding: 6[6] Data for proc: [[25428,1],4] Pid: 0 Local rank: 4 Node rank: 4 App rank: 4 State: INITIALIZED Restarts: 0 App_context: 0 Locale: 8-9 Binding: 8[8] -------------------------------------------------------------------------- mpiexec noticed that process rank 0 with PID 7157 on node rs0.informatik.hs-fulda.de exited on signal 10 (Bus Error). -------------------------------------------------------------------------- [rs0:07157] *** Process received signal *** [rs0:07157] Signal: Bus Error (10) [rs0:07157] Signal code: Invalid address alignment (1) [rs0:07157] Failing at address: 284f4d50495f4d /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:opal_backtrace_print+0x14 /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:0x503e30 /lib/sparcv9/libc.so.1:0xd8684 /lib/sparcv9/libc.so.1:0xcc1f8 /lib/sparcv9/libc.so.1:0xcc404 /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:0x572eb0 [ Signal 2128894800 (?)] /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:opal_hwloc_base_cset2str+0x64 /usr/local/openmpi-1.9_64_cc/lib64/openmpi/mca_odls_default.so:0x126f8 /usr/local/openmpi-1.9_64_cc/lib64/openmpi/mca_odls_default.so:0x135f0 /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:orte_odls_base_default_launch_local+0x1e6c /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:0x53468c /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:0x5348b8 /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:opal_libevent2019_event_base_loop+0x1e8 /usr/local/openmpi-1.9_64_cc/bin/orterun:orterun+0x1ce4 /usr/local/openmpi-1.9_64_cc/bin/orterun:main+0x24 /usr/local/openmpi-1.9_64_cc/bin/orterun:_start+0x12c [rs0:07157] *** End of error message *** [rs0:07159] *** Process received signal *** [rs0:07159] Signal: Segmentation Fault (11) [rs0:07159] Signal code: Address not mapped (1) [rs0:07159] Failing at address: 2300009000008 /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:opal_backtrace_print+0x14 /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:0x503e30 /lib/sparcv9/libc.so.1:0xd8684 /lib/sparcv9/libc.so.1:0xcc1f8 /lib/sparcv9/libc.so.1:0xcc404 /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:0x572eb0 [ Signal 2128894752 (?)] /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:opal_hwloc_base_cset2str+0x64 /usr/local/openmpi-1.9_64_cc/lib64/openmpi/mca_odls_default.so:0x126f8 /usr/local/openmpi-1.9_64_cc/lib64/openmpi/mca_odls_default.so:0x135f0 /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:orte_odls_base_default_launch_local+0x1e6c /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:0x53468c /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:0x5348b8 /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:opal_libevent2019_event_base_loop+0x1e8 /usr/local/openmpi-1.9_64_cc/bin/orterun:orterun+0x1ce4 /usr/local/openmpi-1.9_64_cc/bin/orterun:main+0x24 /usr/local/openmpi-1.9_64_cc/bin/orterun:_start+0x12c [rs0:07159] *** End of error message *** [rs0:07161] *** Process received signal *** [rs0:07161] Signal: Segmentation Fault (11) [rs0:07161] Signal code: Address not mapped (1) [rs0:07161] Failing at address: 900000001210e10 /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:opal_backtrace_print+0x14 /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:0x503e30 /lib/sparcv9/libc.so.1:0xd8684 /lib/sparcv9/libc.so.1:0xcc1f8 /lib/sparcv9/libc.so.1:0xcc404 /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:0x572eb0 [ Signal 2128894752 (?)] /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:opal_hwloc_base_cset2str+0x64 /usr/local/openmpi-1.9_64_cc/lib64/openmpi/mca_odls_default.so:0x126f8 /usr/local/openmpi-1.9_64_cc/lib64/openmpi/mca_odls_default.so:0x135f0 /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:orte_odls_base_default_launch_local+0x1e6c /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:0x53468c /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:0x5348b8 /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:opal_libevent2019_event_base_loop+0x1e8 /usr/local/openmpi-1.9_64_cc/bin/orterun:orterun+0x1ce4 /usr/local/openmpi-1.9_64_cc/bin/orterun:main+0x24 /usr/local/openmpi-1.9_64_cc/bin/orterun:_start+0x12c [rs0:07161] *** End of error message *** [rs0:07163] *** Process received signal *** [rs0:07163] Signal: Bus Error (10) [rs0:07163] Signal code: Invalid address alignment (1) [rs0:07163] Failing at address: 9000000011f /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:opal_backtrace_print+0x14 /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:0x503e30 /lib/sparcv9/libc.so.1:0xd8684 /lib/sparcv9/libc.so.1:0xcc1f8 /lib/sparcv9/libc.so.1:0xcc404 /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:0x572eb0 [ Signal 2128894800 (?)] /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:opal_hwloc_base_cset2str+0x64 /usr/local/openmpi-1.9_64_cc/lib64/openmpi/mca_odls_default.so:0x126f8 /usr/local/openmpi-1.9_64_cc/lib64/openmpi/mca_odls_default.so:0x135f0 /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:orte_odls_base_default_launch_local+0x1e6c /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:0x53468c /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:0x5348b8 /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:opal_libevent2019_event_base_loop+0x1e8 /usr/local/openmpi-1.9_64_cc/bin/orterun:orterun+0x1ce4 /usr/local/openmpi-1.9_64_cc/bin/orterun:main+0x24 /usr/local/openmpi-1.9_64_cc/bin/orterun:_start+0x12c [rs0:07163] *** End of error message *** [rs0:07165] *** Process received signal *** [rs0:07165] Signal: Bus Error (10) [rs0:07165] Signal code: Invalid address alignment (1) [rs0:07165] Failing at address: 766572626f73655d /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:opal_backtrace_print+0x14 /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:0x503e30 /lib/sparcv9/libc.so.1:0xd8684 /lib/sparcv9/libc.so.1:0xcc1f8 /lib/sparcv9/libc.so.1:0xcc404 /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:0x572eb0 [ Signal 2128894800 (?)] /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:opal_hwloc_base_cset2str+0x64 /usr/local/openmpi-1.9_64_cc/lib64/openmpi/mca_odls_default.so:0x126f8 /usr/local/openmpi-1.9_64_cc/lib64/openmpi/mca_odls_default.so:0x135f0 /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:orte_odls_base_default_launch_local+0x1e6c /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:0x53468c /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:0x5348b8 /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:opal_libevent2019_event_base_loop+0x1e8 /usr/local/openmpi-1.9_64_cc/bin/orterun:orterun+0x1ce4 /usr/local/openmpi-1.9_64_cc/bin/orterun:main+0x24 /usr/local/openmpi-1.9_64_cc/bin/orterun:_start+0x12c [rs0:07165] *** End of error message ***
[rs0.informatik.hs-fulda.de:07166] mca:base:select:( ess) Querying component [env] [rs0.informatik.hs-fulda.de:07166] mca:base:select:( ess) Skipping component [env]. Query failed to return a module [rs0.informatik.hs-fulda.de:07166] mca:base:select:( ess) Querying component [hnp] [rs0.informatik.hs-fulda.de:07166] mca:base:select:( ess) Query of component [hnp] set priority to 100 [rs0.informatik.hs-fulda.de:07166] mca:base:select:( ess) Querying component [singleton] [rs0.informatik.hs-fulda.de:07166] mca:base:select:( ess) Skipping component [singleton]. Query failed to return a module [rs0.informatik.hs-fulda.de:07166] mca:base:select:( ess) Querying component [tool] [rs0.informatik.hs-fulda.de:07166] mca:base:select:( ess) Skipping component [tool]. Query failed to return a module [rs0.informatik.hs-fulda.de:07166] mca:base:select:( ess) Selected component [hnp] [rs0.informatik.hs-fulda.de:07166] [[INVALID],INVALID] Topology Info: [rs0.informatik.hs-fulda.de:07166] Type: Machine Number of child objects: 1 Name=NULL total=33554432KB OSName=SunOS OSRelease=5.10 OSVersion=Generic_147440-21 Architecture=sun4u Cpuset: 0x0000ffff Online: 0x0000ffff Allowed: 0x0000ffff Bind CPU proc: TRUE Bind CPU thread: TRUE Bind MEM proc: TRUE Bind MEM thread: TRUE Type: NUMANode Number of child objects: 2 Name=NULL local=33554432KB total=33554432KB Cpuset: 0x0000ffff Online: 0x0000ffff Allowed: 0x0000ffff Type: Socket Number of child objects: 4 Name=NULL CPUType=sparcv9 CPUModel=SPARC64_VII Cpuset: 0x000000ff Online: 0x000000ff Allowed: 0x000000ff Type: Core Number of child objects: 2 Name=NULL Cpuset: 0x00000003 Online: 0x00000003 Allowed: 0x00000003 Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00000001 Online: 0x00000001 Allowed: 0x00000001 Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00000002 Online: 0x00000002 Allowed: 0x00000002 Type: Core Number of child objects: 2 Name=NULL Cpuset: 0x0000000c Online: 0x0000000c Allowed: 0x0000000c Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00000004 Online: 0x00000004 Allowed: 0x00000004 Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00000008 Online: 0x00000008 Allowed: 0x00000008 Type: Core Number of child objects: 2 Name=NULL Cpuset: 0x00000030 Online: 0x00000030 Allowed: 0x00000030 Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00000010 Online: 0x00000010 Allowed: 0x00000010 Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00000020 Online: 0x00000020 Allowed: 0x00000020 Type: Core Number of child objects: 2 Name=NULL Cpuset: 0x000000c0 Online: 0x000000c0 Allowed: 0x000000c0 Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00000040 Online: 0x00000040 Allowed: 0x00000040 Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00000080 Online: 0x00000080 Allowed: 0x00000080 Type: Socket Number of child objects: 4 Name=NULL CPUType=sparcv9 CPUModel=SPARC64_VII Cpuset: 0x0000ff00 Online: 0x0000ff00 Allowed: 0x0000ff00 Type: Core Number of child objects: 2 Name=NULL Cpuset: 0x00000300 Online: 0x00000300 Allowed: 0x00000300 Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00000100 Online: 0x00000100 Allowed: 0x00000100 Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00000200 Online: 0x00000200 Allowed: 0x00000200 Type: Core Number of child objects: 2 Name=NULL Cpuset: 0x00000c00 Online: 0x00000c00 Allowed: 0x00000c00 Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00000400 Online: 0x00000400 Allowed: 0x00000400 Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00000800 Online: 0x00000800 Allowed: 0x00000800 Type: Core Number of child objects: 2 Name=NULL Cpuset: 0x00003000 Online: 0x00003000 Allowed: 0x00003000 Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00001000 Online: 0x00001000 Allowed: 0x00001000 Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00002000 Online: 0x00002000 Allowed: 0x00002000 Type: Core Number of child objects: 2 Name=NULL Cpuset: 0x0000c000 Online: 0x0000c000 Allowed: 0x0000c000 Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00004000 Online: 0x00004000 Allowed: 0x00004000 Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00008000 Online: 0x00008000 Allowed: 0x00008000 -------------------------------------------------------------------------- mpiexec noticed that process rank 0 with PID 7168 on node rs0.informatik.hs-fulda.de exited on signal 11 (Segmentation Fault). -------------------------------------------------------------------------- [rs0:07168] *** Process received signal *** [rs0:07168] Signal: Segmentation Fault (11) [rs0:07168] Signal code: Invalid permissions (2) [rs0:07168] Failing at address: ffffffff7ee32090 /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:opal_backtrace_print+0x14 /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:0x503e30 /lib/sparcv9/libc.so.1:0xd8684 /lib/sparcv9/libc.so.1:0xcc1f8 /lib/sparcv9/libc.so.1:0xcc404 /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:0x572ecc [ Signal 2128894776 (?)] /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:opal_hwloc_base_cset2str+0x64 /usr/local/openmpi-1.9_64_cc/lib64/openmpi/mca_odls_default.so:0x126f8 /usr/local/openmpi-1.9_64_cc/lib64/openmpi/mca_odls_default.so:0x135f0 /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:orte_odls_base_default_launch_local+0x1e6c /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:0x53468c /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:0x5348b8 /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:opal_libevent2019_event_base_loop+0x1e8 /usr/local/openmpi-1.9_64_cc/bin/orterun:orterun+0x1ce4 /usr/local/openmpi-1.9_64_cc/bin/orterun:main+0x24 /usr/local/openmpi-1.9_64_cc/bin/orterun:_start+0x12c [rs0:07168] *** End of error message *** [rs0:07170] *** Process received signal *** [rs0:07170] Signal: Segmentation Fault (11) [rs0:07170] Signal code: Address not mapped (1) [rs0:07170] Failing at address: 0 /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:opal_backtrace_print+0x14 /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:0x503e30 /lib/sparcv9/libc.so.1:0xd8684 /lib/sparcv9/libc.so.1:0xcc1f8 /lib/sparcv9/libc.so.1:0xcc404 /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:0x572eb0 [ Signal 2128894752 (?)] /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:opal_hwloc_base_cset2str+0x64 /usr/local/openmpi-1.9_64_cc/lib64/openmpi/mca_odls_default.so:0x126f8 /usr/local/openmpi-1.9_64_cc/lib64/openmpi/mca_odls_default.so:0x135f0 /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:orte_odls_base_default_launch_local+0x1e6c /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:0x53468c /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:0x5348b8 /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:opal_libevent2019_event_base_loop+0x1e8 /usr/local/openmpi-1.9_64_cc/bin/orterun:orterun+0x1ce4 /usr/local/openmpi-1.9_64_cc/bin/orterun:main+0x24 /usr/local/openmpi-1.9_64_cc/bin/orterun:_start+0x12c [rs0:07170] *** End of error message ***
[rs0.informatik.hs-fulda.de:07171] mca:base:select:( ess) Querying component [env] [rs0.informatik.hs-fulda.de:07171] mca:base:select:( ess) Skipping component [env]. Query failed to return a module [rs0.informatik.hs-fulda.de:07171] mca:base:select:( ess) Querying component [hnp] [rs0.informatik.hs-fulda.de:07171] mca:base:select:( ess) Query of component [hnp] set priority to 100 [rs0.informatik.hs-fulda.de:07171] mca:base:select:( ess) Querying component [singleton] [rs0.informatik.hs-fulda.de:07171] mca:base:select:( ess) Skipping component [singleton]. Query failed to return a module [rs0.informatik.hs-fulda.de:07171] mca:base:select:( ess) Querying component [tool] [rs0.informatik.hs-fulda.de:07171] mca:base:select:( ess) Skipping component [tool]. Query failed to return a module [rs0.informatik.hs-fulda.de:07171] mca:base:select:( ess) Selected component [hnp] [rs0.informatik.hs-fulda.de:07171] [[INVALID],INVALID] Topology Info: [rs0.informatik.hs-fulda.de:07171] Type: Machine Number of child objects: 1 Name=NULL total=33554432KB OSName=SunOS OSRelease=5.10 OSVersion=Generic_147440-21 Architecture=sun4u Cpuset: 0x0000ffff Online: 0x0000ffff Allowed: 0x0000ffff Bind CPU proc: TRUE Bind CPU thread: TRUE Bind MEM proc: TRUE Bind MEM thread: TRUE Type: NUMANode Number of child objects: 2 Name=NULL local=33554432KB total=33554432KB Cpuset: 0x0000ffff Online: 0x0000ffff Allowed: 0x0000ffff Type: Socket Number of child objects: 4 Name=NULL CPUType=sparcv9 CPUModel=SPARC64_VII Cpuset: 0x000000ff Online: 0x000000ff Allowed: 0x000000ff Type: Core Number of child objects: 2 Name=NULL Cpuset: 0x00000003 Online: 0x00000003 Allowed: 0x00000003 Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00000001 Online: 0x00000001 Allowed: 0x00000001 Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00000002 Online: 0x00000002 Allowed: 0x00000002 Type: Core Number of child objects: 2 Name=NULL Cpuset: 0x0000000c Online: 0x0000000c Allowed: 0x0000000c Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00000004 Online: 0x00000004 Allowed: 0x00000004 Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00000008 Online: 0x00000008 Allowed: 0x00000008 Type: Core Number of child objects: 2 Name=NULL Cpuset: 0x00000030 Online: 0x00000030 Allowed: 0x00000030 Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00000010 Online: 0x00000010 Allowed: 0x00000010 Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00000020 Online: 0x00000020 Allowed: 0x00000020 Type: Core Number of child objects: 2 Name=NULL Cpuset: 0x000000c0 Online: 0x000000c0 Allowed: 0x000000c0 Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00000040 Online: 0x00000040 Allowed: 0x00000040 Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00000080 Online: 0x00000080 Allowed: 0x00000080 Type: Socket Number of child objects: 4 Name=NULL CPUType=sparcv9 CPUModel=SPARC64_VII Cpuset: 0x0000ff00 Online: 0x0000ff00 Allowed: 0x0000ff00 Type: Core Number of child objects: 2 Name=NULL Cpuset: 0x00000300 Online: 0x00000300 Allowed: 0x00000300 Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00000100 Online: 0x00000100 Allowed: 0x00000100 Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00000200 Online: 0x00000200 Allowed: 0x00000200 Type: Core Number of child objects: 2 Name=NULL Cpuset: 0x00000c00 Online: 0x00000c00 Allowed: 0x00000c00 Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00000400 Online: 0x00000400 Allowed: 0x00000400 Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00000800 Online: 0x00000800 Allowed: 0x00000800 Type: Core Number of child objects: 2 Name=NULL Cpuset: 0x00003000 Online: 0x00003000 Allowed: 0x00003000 Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00001000 Online: 0x00001000 Allowed: 0x00001000 Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00002000 Online: 0x00002000 Allowed: 0x00002000 Type: Core Number of child objects: 2 Name=NULL Cpuset: 0x0000c000 Online: 0x0000c000 Allowed: 0x0000c000 Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00004000 Online: 0x00004000 Allowed: 0x00004000 Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00008000 Online: 0x00008000 Allowed: 0x00008000 -------------------------------------------------------------------------- mpiexec noticed that process rank 0 with PID 7173 on node rs0.informatik.hs-fulda.de exited on signal 11 (Segmentation Fault). -------------------------------------------------------------------------- [rs0:07173] *** Process received signal *** [rs0:07173] Signal: Segmentation Fault (11) [rs0:07173] Signal code: Invalid permissions (2) [rs0:07173] Failing at address: ffffffff7ee32090 /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:opal_backtrace_print+0x14 /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:0x503e30 /lib/sparcv9/libc.so.1:0xd8684 /lib/sparcv9/libc.so.1:0xcc1f8 /lib/sparcv9/libc.so.1:0xcc404 /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:0x572ecc [ Signal 2128894776 (?)] /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:opal_hwloc_base_cset2str+0x64 /usr/local/openmpi-1.9_64_cc/lib64/openmpi/mca_odls_default.so:0x126f8 /usr/local/openmpi-1.9_64_cc/lib64/openmpi/mca_odls_default.so:0x135f0 /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:orte_odls_base_default_launch_local+0x1e6c /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:0x53468c /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:0x5348b8 /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:opal_libevent2019_event_base_loop+0x1e8 /usr/local/openmpi-1.9_64_cc/bin/orterun:orterun+0x1ce4 /usr/local/openmpi-1.9_64_cc/bin/orterun:main+0x24 /usr/local/openmpi-1.9_64_cc/bin/orterun:_start+0x12c [rs0:07173] *** End of error message *** [rs0:07175] *** Process received signal *** [rs0:07175] Signal: Segmentation Fault (11) [rs0:07175] Signal code: Address not mapped (1) [rs0:07175] Failing at address: 0 /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:opal_backtrace_print+0x14 /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:0x503e30 /lib/sparcv9/libc.so.1:0xd8684 /lib/sparcv9/libc.so.1:0xcc1f8 /lib/sparcv9/libc.so.1:0xcc404 /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:0x572eb0 [ Signal 2128894752 (?)] /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:opal_hwloc_base_cset2str+0x64 /usr/local/openmpi-1.9_64_cc/lib64/openmpi/mca_odls_default.so:0x126f8 /usr/local/openmpi-1.9_64_cc/lib64/openmpi/mca_odls_default.so:0x135f0 /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:orte_odls_base_default_launch_local+0x1e6c /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:0x53468c /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:0x5348b8 /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:opal_libevent2019_event_base_loop+0x1e8 /usr/local/openmpi-1.9_64_cc/bin/orterun:orterun+0x1ce4 /usr/local/openmpi-1.9_64_cc/bin/orterun:main+0x24 /usr/local/openmpi-1.9_64_cc/bin/orterun:_start+0x12c [rs0:07175] *** End of error message ***
[rs0.informatik.hs-fulda.de:07190] mca:base:select:( ess) Querying component [env] [rs0.informatik.hs-fulda.de:07190] mca:base:select:( ess) Skipping component [env]. Query failed to return a module [rs0.informatik.hs-fulda.de:07190] mca:base:select:( ess) Querying component [hnp] [rs0.informatik.hs-fulda.de:07190] mca:base:select:( ess) Query of component [hnp] set priority to 100 [rs0.informatik.hs-fulda.de:07190] mca:base:select:( ess) Querying component [singleton] [rs0.informatik.hs-fulda.de:07190] mca:base:select:( ess) Skipping component [singleton]. Query failed to return a module [rs0.informatik.hs-fulda.de:07190] mca:base:select:( ess) Querying component [tool] [rs0.informatik.hs-fulda.de:07190] mca:base:select:( ess) Skipping component [tool]. Query failed to return a module [rs0.informatik.hs-fulda.de:07190] mca:base:select:( ess) Selected component [hnp] [rs0.informatik.hs-fulda.de:07190] [[INVALID],INVALID] Topology Info: [rs0.informatik.hs-fulda.de:07190] Type: Machine Number of child objects: 1 Name=NULL total=33554432KB OSName=SunOS OSRelease=5.10 OSVersion=Generic_147440-21 Architecture=sun4u Cpuset: 0x0000ffff Online: 0x0000ffff Allowed: 0x0000ffff Bind CPU proc: TRUE Bind CPU thread: TRUE Bind MEM proc: TRUE Bind MEM thread: TRUE Type: NUMANode Number of child objects: 2 Name=NULL local=33554432KB total=33554432KB Cpuset: 0x0000ffff Online: 0x0000ffff Allowed: 0x0000ffff Type: Socket Number of child objects: 4 Name=NULL CPUType=sparcv9 CPUModel=SPARC64_VII Cpuset: 0x000000ff Online: 0x000000ff Allowed: 0x000000ff Type: Core Number of child objects: 2 Name=NULL Cpuset: 0x00000003 Online: 0x00000003 Allowed: 0x00000003 Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00000001 Online: 0x00000001 Allowed: 0x00000001 Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00000002 Online: 0x00000002 Allowed: 0x00000002 Type: Core Number of child objects: 2 Name=NULL Cpuset: 0x0000000c Online: 0x0000000c Allowed: 0x0000000c Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00000004 Online: 0x00000004 Allowed: 0x00000004 Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00000008 Online: 0x00000008 Allowed: 0x00000008 Type: Core Number of child objects: 2 Name=NULL Cpuset: 0x00000030 Online: 0x00000030 Allowed: 0x00000030 Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00000010 Online: 0x00000010 Allowed: 0x00000010 Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00000020 Online: 0x00000020 Allowed: 0x00000020 Type: Core Number of child objects: 2 Name=NULL Cpuset: 0x000000c0 Online: 0x000000c0 Allowed: 0x000000c0 Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00000040 Online: 0x00000040 Allowed: 0x00000040 Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00000080 Online: 0x00000080 Allowed: 0x00000080 Type: Socket Number of child objects: 4 Name=NULL CPUType=sparcv9 CPUModel=SPARC64_VII Cpuset: 0x0000ff00 Online: 0x0000ff00 Allowed: 0x0000ff00 Type: Core Number of child objects: 2 Name=NULL Cpuset: 0x00000300 Online: 0x00000300 Allowed: 0x00000300 Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00000100 Online: 0x00000100 Allowed: 0x00000100 Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00000200 Online: 0x00000200 Allowed: 0x00000200 Type: Core Number of child objects: 2 Name=NULL Cpuset: 0x00000c00 Online: 0x00000c00 Allowed: 0x00000c00 Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00000400 Online: 0x00000400 Allowed: 0x00000400 Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00000800 Online: 0x00000800 Allowed: 0x00000800 Type: Core Number of child objects: 2 Name=NULL Cpuset: 0x00003000 Online: 0x00003000 Allowed: 0x00003000 Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00001000 Online: 0x00001000 Allowed: 0x00001000 Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00002000 Online: 0x00002000 Allowed: 0x00002000 Type: Core Number of child objects: 2 Name=NULL Cpuset: 0x0000c000 Online: 0x0000c000 Allowed: 0x0000c000 Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00004000 Online: 0x00004000 Allowed: 0x00004000 Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00008000 Online: 0x00008000 Allowed: 0x00008000 [rs0.informatik.hs-fulda.de:07190] MCW rank 0 bound to : [B./../../..][../../../..] Wed Sep 26 12:43:09 CEST 2012
[rs0:07198] *** Process received signal *** [rs0:07198] Signal: Bus Error (10) [rs0:07198] Signal code: Invalid address alignment (1) [rs0:07198] Failing at address: 3a /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:opal_backtrace_print+0x14 /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:0x503e30 /lib/sparcv9/libc.so.1:0xd8684 /lib/sparcv9/libc.so.1:0xcc1f8 /lib/sparcv9/libc.so.1:0xcc404 /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:0x572eb0 [ Signal 2128894800 (?)] /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:opal_hwloc_base_cset2str+0x64 /usr/local/openmpi-1.9_64_cc/lib64/openmpi/mca_odls_default.so:0x126f8 /usr/local/openmpi-1.9_64_cc/lib64/openmpi/mca_odls_default.so:0x135f0 /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:orte_odls_base_default_launch_local+0x1e6c /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:0x53468c /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:0x5348b8 /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:opal_libevent2019_event_base_loop+0x1e8 /usr/local/openmpi-1.9_64_cc/bin/orterun:orterun+0x1ce4 /usr/local/openmpi-1.9_64_cc/bin/orterun:main+0x24 /usr/local/openmpi-1.9_64_cc/bin/orterun:_start+0x12c [rs0:07198] *** End of error message *** -------------------------------------------------------------------------- mpiexec noticed that process rank 0 with PID 7198 on node rs0.informatik.hs-fulda.de exited on signal 10 (Bus Error). --------------------------------------------------------------------------