[OMPI users] Problem with openmpi and infiniband
Hello, I am new to this list, where I hope to find a solution for a problem that I have been having for quite a longtime. I run various versions of openmpi (from 1.1.2 to 1.2.8) on a cluster with Infiniband interconnects that I use and administer at the same time. The openfabric stac is OFED-1.2.5, the compilers gcc 4.2 and Intel. The queue manager is SGE 6.0u8. The trouble is with an MPI code that runs fine with an openmpi 1.1.2 library compiled without infiniband support (I have tested the scalability of the code up to 64 cores, the nodes are 4 or 8 cores, the results are exactly what I expect), but if I try to use a version compiled for infiniband, then only a subset of comunications (the ones connecting cores in the same node) are enabled, and because of this the program fails (gets stuck in a perennial waiting phase, in particular). This happens with any combination of compilers/library releases (1.1.2, 1.2.7, 1.2.8) I have tried. On other codes, and in particular on benchmarks downloaded from the net, openmpi over infiniband seems to work (I compared the latency with the tcp btl, so I am pretty sure that infiniband works). The two variables I kept fixed are SGE and the OFED module stack. I would like not to touch them, if possible, because the cluster seems to run fine for other purposes. My question is: does anyone has a suggestion on what I could try next? I'm pretty sure that to get an answer I need to provide more details, which I am willing to do, but in more than two months of testing/trying/hoping/praying I have accumulated so much material and information that if I post everything in this e-mail I am likely to confuse a potential helper, more than helping him to understand the problem. Thank you in advance, Biagio Lucini -- = Dr. Biagio Lucini Department of Physics, Swansea University Singleton Park, SA2 8PP Swansea (UK) Tel. +44 (0)1792 602284 =
Re: [OMPI users] Problem with openmpi and infiniband
Hi Dorian, thank you for your message. doriankrause wrote: The trouble is with an MPI code that runs fine with an openmpi 1.1.2 library compiled without infiniband support (I have tested the scalability of the code up to 64 cores, the nodes are 4 or 8 cores, the results are exactly what I expect), but if I try to use a version compiled for infiniband, then only a subset of comunications (the ones connecting cores in the same node) are enabled, and because of this the program fails (gets stuck in a perennial waiting phase, in particular). This happens with any combination of compilers/library releases (1.1.2, 1.2.7, 1.2.8) I have tried. On other codes, and in particular on benchmarks downloaded from the net, openmpi over infiniband seems to work (I compared the latency with the tcp btl, so I am pretty sure that infiniband works). The two variables I kept fixed are SGE and the OFED module stack. I would like not to touch them, if possible, because the cluster seems to run fine for other purposes. Does the problem only show up with openmpi? Did you tried to use mvapich (http://mvapich.cse.ohio-state.edu/) to test whether it is a hardware or software problem? (I don't know any other open-source MPI implementation which supports infiniband) I have had bad experiences with mpich, on which mvapich is based. The short answer to your question is yes, and it did not work for other reasons (not even over ethernet). The interesting development today is that Intel MPI (which should be more or less mvapich2 if I am not wrong) seems to work (I will verify this also with mvapich2). This seems to point towards a problem with the OpenMPI libraries, but I have reservations: they seem to work for even complicated benchmarking tests (like the Intel Benchmark) AND I have troubles also with mpich, which I did not sort out. A possibility is that the problem is generated by the interaction MPI-SGE-my code. I would love if someone more experienced than me would give a look at the code (which unfortunately is fortran). I will try to trim down the over 4000 lines to a manageable proof of concept, if anyone is interested in following this up, but it is unlikely to happen before new year :-) Thanks again, Biagio
Re: [OMPI users] Problem with openmpi and infiniband
Pavel Shamis (Pasha) wrote: Biagio Lucini wrote: Hello, I am new to this list, where I hope to find a solution for a problem that I have been having for quite a longtime. I run various versions of openmpi (from 1.1.2 to 1.2.8) on a cluster with Infiniband interconnects that I use and administer at the same time. The openfabric stac is OFED-1.2.5, the compilers gcc 4.2 and Intel. The queue manager is SGE 6.0u8. Do you use OpenMPI version that is included in OFED ? Did you was able to run basic OFED/OMPI tests/benchmarks between two nodes ? Hi, yes to both questions: the OMPI version is the one that comes with OFED (1.1.2-1) and the basic tests run fine. For instance, IMB-MPI1 (which is more than basic, as far as I can see) reports for the last test: #--- # Benchmarking Barrier # #processes = 6 #--- #repetitions t_min[usec] t_max[usec] t_avg[usec] 100022.9322.9522.94 for the openib,self btl (6 processes, all processes on different nodes) and #--- # Benchmarking Barrier # #processes = 6 #--- #repetitions t_min[usec] t_max[usec] t_avg[usec] 1000 191.30 191.42 191.34 for the tcp,self btl (same test) No anomalies for other tests (ping-pong, all-to-all etc.) Thanks, Biagio -- = Dr. Biagio Lucini Department of Physics, Swansea University Singleton Park, SA2 8PP Swansea (UK) Tel. +44 (0)1792 602284 =
Re: [OMPI users] Problem with openmpi and infiniband
Tim Mattox wrote: For your runs with Open MPI over InfiniBand, try using openib,sm,self for the BTL setting, so that shared memory communications are used within a node. It would give us another datapoint to help diagnose the problem. As for other things we would need to help diagnose the problem, please follow the advice on this FAQ entry, and the help page: http://www.open-mpi.org/faq/?category=openfabrics#ofa-troubleshoot http://www.open-mpi.org/community/help/ Dear Tim, thank you for this pointer. 1) Ofed: It's 1.2.5, from the OpenFabrics website 2) Linux version: scientific linux (RH enterprise remaster) v. 4.2, kernel 2.6.9-55.0.12.ELsmp 3) Subnet manager: OpenSM 4)ibv_devinfo hca_id:mthca0 fw_ver:1.0.800 node_guid:0002:c902:0022:b398 sys_image_guid:0002:c902:0022:b39b vendor_id:0x02c9 vendor_part_id:25204 hw_ver:0xA0 board_id:MT_03B0120002 phys_port_cnt:1 port:1 state:PORT_ACTIVE (4) max_mtu:2048 (4) active_mtu:2048 (4) sm_lid:9 port_lid:97 port_lmc:0x00 (no node is different from the others, as far as the problem is concerned) 5) ifconfig: eth0 Link encap:Ethernet HWaddr 00:17:31:E3:89:4A inet addr:10.0.0.12 Bcast:10.0.0.255 Mask:255.255.255.0 inet6 addr: fe80::217:31ff:fee3:894a/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:23348585 errors:0 dropped:0 overruns:0 frame:0 TX packets:17247486 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:19410724189 (18.0 GiB) TX bytes:14981325997 (13.9 GiB) Interrupt:209 loLink encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:5088 errors:0 dropped:0 overruns:0 frame:0 TX packets:5088 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:2468843 (2.3 MiB) TX bytes:2468843 (2.3 MiB) 6) ulimit -l 8388608 (this is more than the physical memory on the node) 7) output of ompi_info attached (I have tried also earlier releases) 8) description of the problem: a program seems to communicate correctly over the TCP network, but not over the ifiniband network. The program is structured in such a way that if the communication does not happen, a loop become infinite. So there is no error message, just a program entering an infinite loop. The command line used are: The command line I use is mpirun -mca btl openib,sm,self (with openib replaced by tcp in the case of communication over ethernet). I could include the path and the value of the variable LD_LIBRARY_PATH, but it won't tell too much, since the installation directory is non-standard (/opt/ompi128-intel/bin for the path and /opt/ompi128-intel/lib for the libs). I hope to have provided all the required info, if you need more or some of them in more detail, please let me know. Many thanks, Biagio Lucini Open MPI: 1.2.8 Open MPI SVN revision: r19718 Open RTE: 1.2.8 Open RTE SVN revision: r19718 OPAL: 1.2.8 OPAL SVN revision: r19718 Prefix: /opt/ompi128-intel Configured architecture: x86_64-unknown-linux-gnu Configured by: root Configured on: Tue Dec 23 12:33:51 GMT 2008 Configure host: master.cluster Built by: root Built on: Tue Dec 23 12:38:34 GMT 2008 Built host: master.cluster C bindings: yes C++ bindings: yes Fortran77 bindings: yes (all) Fortran90 bindings: yes Fortran90 bindings size: small C compiler: icc C compiler absolute: /opt/intel/cce/9.1.045/bin/icc C++ compiler: icpc C++ compiler absolute: /opt/intel/cce/9.1.045/bin/icpc Fortran77 compiler: ifort Fortran77 compiler abs: /opt/intel/fce/9.1.040/bin/ifort Fortran90 compiler: ifort Fortran90 compiler abs: /opt/intel/fce/9.1.040/bin/ifort C profiling: yes C++ profiling: yes Fortran77 profiling: yes Fortran90 profiling: yes C++ exceptions: no Thread support: posix (mpi: no, progress: no) Internal debug support: no MPI parameter check: runtime Memory profiling support: no Memory debugging support: no libltdl support: yes Heterogeneous support: yes mpirun default --prefix: no MCA backtrace: execinfo (MCA v1.0, API v1.0, Component v1.2.8) MCA memory: ptmalloc2 (MCA v1.0, API v1.0, Component v1.2.8) MCA paffinity: linux (MCA v1.0, API v1.0, Component v1.2.8) MCA maffinity:
Re: [OMPI users] Problem with openmpi and infiniband
Jeff Squyres wrote: Another thing to try is a change that we made late in the Open MPI v1.2 series with regards to IB: http://www.open-mpi.org/faq/?category=openfabrics#v1.2-use-early-completion Thanks, this is something worth investigating. What would be the exact syntax to use to turn off pml_ob1_use_early_completion? Do you think the same problem can also happen in the 1.1(.2) release, which is the one I have also tested, since it comes with Ofed 1.2.5? Would it be worth to try the 1.3? So far I have avoided it since it is tagged as "prerelease". Thanks, Biagio
Re: [OMPI users] openMPI, transfer data from multiple sources to one destination
Jack Bryan wrote: HI, I need to transfer data from multiple sources to one destination. [...] Probably it is not the best solution, but what I did was the following: a) the receiver listen for transmitters ready to send data with MPI_IRECV. The message overwrites a logical array, which is initialised to false. If some transmitter is ready, then the corresponding entry in the array is updated to true b) when ready, a transmitter send a true (with MPI_SEND) to the receiver and open a communication channel for the data, with another call to MPI_SEND c) after having checked for availability of all the transmitters, the receiver cycles over the transmitters that are ready to communicate (entry of the logical array equal to true) and open a communication channel in blocking mode (MPI_RECV) with each of them, in turn d) the transmitter reinitialise the logical array to false and goes back to (a) above This implementation assumes that you do not need the data in any particular order. Hope it works for you. Biagio -- = Dr. Biagio Lucini Department of Physics, Swansea University Singleton Park, SA2 8PP Swansea (UK) Tel. +44 (0)1792 602284 =
Re: [OMPI users] Problem with openmpi and infiniband
Pavel Shamis (Pasha) wrote: Your problem definitely maybe related to the know issue with early completions. The exact syntax is:| --mca pml_ob1_use_early_completion 0| Thanks, I am currently looking for the first available spot on the cluster, then I will try this. I'll let you know. Biagio -- = Dr. Biagio Lucini Department of Physics, Swansea University Singleton Park, SA2 8PP Swansea (UK) Tel. +44 (0)1792 602284 =
Re: [OMPI users] Problem with openmpi and infiniband
Pavel Shamis (Pasha) wrote: Another thing to try is a change that we made late in the Open MPI v1.2 series with regards to IB: http://www.open-mpi.org/faq/?category=openfabrics#v1.2-use-early-completion Thanks, this is something worth investigating. What would be the exact syntax to use to turn off pml_ob1_use_early_completion? Your problem definitely maybe related to the know issue with early completions. The exact syntax is:| --mca pml_ob1_use_early_completion 0| Unfortunately this did not help: still the same problem. Here is the script I run: last line for the tcp test, previous line for the openib test. -- #!/bin/bash #$ -S /bin/bash #Set out, error and job name #$ -o run2.out #$ -e run2.err #$ -N su3_01Jan #Number of nodes for mpi (18 in this case) #$ -pe make 38 # The batchsystem should use the current directory as working directory. #$ -cwd export LD_LIBRARY_PATH=/opt/numactl-0.6.4/:/opt/sge-6.0u8/lib/lx24-amd64:/opt/ompi128-intel/lib echo LD_LIBRARY_PATH $LD_LIBRARY_PATH ldd ./k-string ulimit -l 8388608 ulimit -a export PATH=$PATH:/opt/ompi128-intel/bin which mpirun #The actual mpirun command #mpirun -np $NSLOTS -mca btl openib,sm,self --mca pml_ob1_use_early_completion 0 ./k-string mpirun -np $NSLOTS -mca btl tcp,sm,self ./k-string --- This also contains extra diagnostic for the path, library path, memory locked etc. All seems ok, and as before the tcp run goes well, the openib run has communication problem (it looks like no communication channel can be open or recognised). I will try OMPI1.3 rc2 (as it has been suggested), failing that I will try to isolate a test case, to see if the problem can be reproduced on other systems. Meanwhile, I'm happy to listen to any suggestion you might have. Thanks, Biagio
Re: [OMPI users] Problem with openmpi and infiniband
The test was in fact ok, I have also verified it on 30 processors. Meanwhile I tried OMPI1.3RC2, with which the application fails on infiniband, I hope this will give some clue (or at least be useful to finalise the release of OpenMPI 1.3). I remind the mailing list that I use the OFED 1.2.5 release. The only change with respect the last time is the use of OMPI1.3RC2 instead of 1.2.8. To avoid boring the mailing list, I don't repeat details I have already provided (like the command line parameters) on which we seem to have agreed that there is no problem. However, if you want to know more, please ask. The error file as produced by SGE is attached. Thanks, Biagio Lenny Verkhovsky wrote: Hi, just to make sure, you wrote in the previous mail that you tested IMB-MPI1 and it "reports for the last test" , and the results are for "processes=6", since you have 4 and 8 core machines, this test could be run on the same 8 core machine over shared memory and not over Infiniband, as you suspected. You can rerun the IMB-MPI1 test with -mca btl self,openib to be sure that the test does not use shared memory or tcp. Lenny. On 12/24/08, Biagio Lucini wrote: Pavel Shamis (Pasha) wrote: Biagio Lucini wrote: Hello, I am new to this list, where I hope to find a solution for a problem that I have been having for quite a longtime. I run various versions of openmpi (from 1.1.2 to 1.2.8) on a cluster with Infiniband interconnects that I use and administer at the same time. The openfabric stac is OFED-1.2.5, the compilers gcc 4.2 and Intel. The queue manager is SGE 6.0u8. Do you use OpenMPI version that is included in OFED ? Did you was able to run basic OFED/OMPI tests/benchmarks between two nodes ? Hi, yes to both questions: the OMPI version is the one that comes with OFED (1.1.2-1) and the basic tests run fine. For instance, IMB-MPI1 (which is more than basic, as far as I can see) reports for the last test: #--- # Benchmarking Barrier # #processes = 6 #--- #repetitions t_min[usec] t_max[usec] t_avg[usec] 100022.9322.9522.94 for the openib,self btl (6 processes, all processes on different nodes) and #--- # Benchmarking Barrier # #processes = 6 #--- #repetitions t_min[usec] t_max[usec] t_avg[usec] 1000 191.30 191.42 191.34 for the tcp,self btl (same test) No anomalies for other tests (ping-pong, all-to-all etc.) Thanks, Biagio -- ========= Dr. Biagio Lucini Department of Physics, Swansea University Singleton Park, SA2 8PP Swansea (UK) Tel. +44 (0)1792 602284 = ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users [[5963,1],13][btl_openib_component.c:2893:handle_wc] from node24 to: node11 error polling LP CQ with status RECEIVER NOT READY RETRY EXCEEDED ERROR status number 13 for wr_id 37779456 opcode 0 qp_idx 0 [[5963,1],12][btl_openib_component.c:2893:handle_wc] from node23 to: node11 error polling LP CQ with status RECEIVER NOT READY RETRY EXCEEDED ERROR status number 13 for wr_id 37779456 opcode 0 qp_idx 0 [[5963,1],8][btl_openib_component.c:2893:handle_wc] from node9 to: node11 error polling LP CQ with status RECEIVER NOT READY RETRY EXCEEDED ERROR status number 13 for wr_id 37779456 opcode 0 qp_idx 0 [[5963,1],11][btl_openib_component.c:2893:handle_wc] from node20 to: node11 error polling LP CQ with status RECEIVER NOT READY RETRY EXCEEDED ERROR status number 13 for wr_id 37779456 opcode 0 qp_idx 0 [[5963,1],9][btl_openib_component.c:2893:handle_wc] from node18 to: node11 error polling LP CQ with status RECEIVER NOT READY RETRY EXCEEDED ERROR status number 13 for wr_id 37779456 opcode 0 qp_idx 0 [[5963,1],4][btl_openib_component.c:2893:handle_wc] from node13 to: node11 error polling LP CQ with status RECEIVER NOT READY RETRY EXCEEDED ERROR status number 13 for wr_id 37779456 opcode 0 qp_idx 0 [[5963,1],3][btl_openib_component.c:2893:handle_wc] from node12 to: node11 error polling LP CQ with status RECEIVER NOT READY RETRY EXCEEDED ERROR status number 13 for wr_id 37779456 opcode 0 qp_idx 0 [[5963,1],6][btl_openib_component.c:2893:handle_wc] from node15 to: node11 error polling LP CQ with status RECEIVER NOT READY RETRY EXCEEDED ERROR status number 13 for wr_id 37779456 opcode 0 qp_idx 0 [[5963,1],1][btl_openib_component.c:2893:handle_wc] from node10 to: node11 error polling LP CQ with status
Re: [OMPI users] Problem with openmpi and infiniband
Jeff Squyres wrote: On Jan 7, 2009, at 6:28 PM, Biagio Lucini wrote: [[5963,1],13][btl_openib_component.c:2893:handle_wc] from node24 to: node11 error polling LP CQ with status RECEIVER NOT READY RETRY EXCEEDED ERROR status number 13 for wr_id 37779456 opcode 0 qp_idx 0 Ah! If we're dealing a RNR retry exceeded, this is *usually* a physical layer problem on the IB fabric. Have you run a complete layer 0 / physical set of diagnostics on the fabric to know that it is completely working properly? Once again, apologies for the delayed answer, but I always need to find a free spot to perform checks without disrupting the activity of the other users, who seem to be happy with the present status (this includes the other users of infiniband). What I have done is to run the Intel MPI Benchmark in a stress-mode over 40 nodes and then on exactly the same nodes my code. The errors for my code are attached. I do not attach the Intel benchmark file, since it is 100k and might upset someone, but I can send it on request. If I pick a random test: #- # Benchmarking Exchange # #processes = 40 #- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] Mbytes/sec 0 100019.7020.3719.87 0.00 1 100012.8013.6113.25 0.28 2 100012.9413.7313.39 0.56 4 100012.9313.2413.14 1.15 8 100012.4612.8912.65 2.37 16 100014.5915.3515.00 3.98 32 100012.8313.4213.26 9.09 64 100013.1713.4913.31 18.10 128 100013.8314.4014.20 33.90 256 100016.4717.3416.89 56.33 512 100022.7223.2922.99 83.85 1024 100035.0936.3035.72 107.62 2048 100071.2872.4671.91 107.81 4096 1000 139.78 141.55 140.72 110.38 8192 1000 237.86 240.13 239.10 130.14 16384 1000 481.37 486.15 484.10 128.56 32768 1000 864.89 872.48 869.35 143.27 65536 640 1607.97 1629.53 1620.19 153.42 131072 320 3106.92 3196.91 3160.10 156.40 262144 160 5970.66 6333.02 6185.35 157.90 524288 80 16322.10 18509.40 17627.17 108.05 1048576 40 31194.17 40981.73 37056.97 97.60 2097152 20 38023.90 77308.80 61021.08 103.48 4194304 10 20423.82143447.80 84832.93 111.54 -- As you can see, the Intel benchmark runs fine on this set of nodes; I have been running it for a few hours without any problem. On the other hands, my job still has this problem. To recap: both are compiled with openmpi, the benchmark looks fine and my job refuses to establish communication among processes without giving any error message with OMPI 1.2.x (various x) while gives the attached error message with 1.3rc2. I have tried ibcheckerrors, which reports: #warn: counter SymbolErrors = 65535 (threshold 10) #warn: counter LinkDowned = 20 (threshold 10) #warn: counter XmtDiscards = 65535 (threshold 100) Error check on lid 1 (MT47396 Infiniscale-III Mellanox Technologies) port all: FAILED #warn: counter SymbolErrors = 65535 (threshold 10) Error check on lid 1 (MT47396 Infiniscale-III Mellanox Technologies) port 10: FAILED # Checked Switch: nodeguid 0x000b8c002347 with failure #warn: counter XmtDiscards = 65535 (threshold 100) Error check on lid 1 (MT47396 Infiniscale-III Mellanox Technologies) port 1: FAILED ## Summary: 25 nodes checked, 0 bad nodes found ## 48 ports checked, 2 ports have errors beyond threshold Admittedly, not encouraging. The output of ibnetdiscover is attached. I should had that the cluster (including infiniband) is currently being used. Unfortunately, my experience with infiniband is not adequate to Any further clue on possible problems is very welcome. Many thanks for your attention, Biagio -- = Dr. Biagio Lucini Department of Physics, Swansea University Singleton Park, SA2 8PP Swansea (UK) Tel. +44 (0)1792 602284 = [n
Re: [OMPI users] openib RETRY EXCEEDED ERROR
Bogdan Costescu wrote: Brett Pemberton wrote: [[1176,1],0][btl_openib_component.c:2905:handle_wc] from tango092.vpac.org to: tango090 error polling LP CQ with status RETRY EXCEEDED ERROR status number 12 for wr_id 38996224 opcode 0 qp_idx 0 I've seen this error with Mellanox ConnectX cards and OFED 1.2.x with all versions of OpenMPI that I have tried (1.2.x and pre-1.3) and some MVAPICH versions, from which I have concluded that the problem lies in the lower levels (OFED or IB card firmware). Indeed after the installation of OFED 1.3.x and a possible firmware update (not sure about the firmware as I don't admin that cluster), these errors have disappeared. I can confirm this: I had a similar problem over Christmas, for which I asked for help in this list. In fact the problem was not with OpenMPI, but with the OFED stack: an upgrade of the latter (and an upgrade of the firmware, although once again the OFED drivers were complaining about the firmware being too old) fixed the problem. We did both upgrades at once, so as in Brett's case I am not sure which one played the major role. Biagio -- ===== Dr. Biagio Lucini Department of Physics, Swansea University Singleton Park, SA2 8PP Swansea (UK) Tel. +44 (0)1792 602284 =
[OMPI users] "casual" error
We have an application that runs for a very long time with 16 processes (the time is order a few months; we do have check points, but this won't be the issue). It has happened twice that it fails with the error message appended below after running undisturbed for 20-25 days. It has happened twice so far. This error is not systematically reproducible, and I believe this is not just because the program is parallel. We use openmpi-1.2.5 as distributed in the RH 5.2-clone Scientific Linux, on which our cluster is based. Is this stack suggesting anything to eyes more trained than main? Many thanks, Biagio Lucini - [node20:04178] *** Process received signal *** [node20:04178] Signal: Segmentation fault (11) [node20:04178] Signal code: Address not mapped (1) [node20:04178] Failing at address: 0x2aaadb8b31a0 [node20:04178] [ 0] /lib64/libpthread.so.0 [0x2b5d9c3ebe80] [node20:04178] [ 1] /usr/lib64/openmpi/1.2.5-gcc/lib/libopen-pal.so.0(_int_malloc+0x1d4) [0x2b5d9ccb2 f84] [node20:04178] [ 2] /usr/lib64/openmpi/1.2.5-gcc/lib/libopen-pal.so.0(malloc+0x93) [0x2b5d9ccb4d93] [node20:04178] [ 3] /lib64/libc.so.6 [0x2b5d9d77729a] [node20:04178] [ 4] /usr/lib64/libstdc++.so.6(_ZNSt12__basic_fileIcE4openEPKcSt13_Ios_Openmodei+0x54) [0x2b5d9bf05cb4] [node20:04178] [ 5] /usr/lib64/libstdc++.so.6(_ZNSt13basic_filebufIcSt11char_traitsIcEE4openEPKcSt13_ Ios_Openmode+0x83) [0x2b5d9beb45c3] [node20:04178] [ 6] ./k-string(wait_thread_+0x2a1) [0x42e101] [node20:04178] [ 7] ./k-string(MAIN__+0x2a72) [0x4212d2] [node20:04178] [ 8] ./k-string(main+0xe) [0x42e2ce] [node20:04178] [ 9] /lib64/libc.so.6(__libc_start_main+0xf4) [0x2b5d9d7338b4] [node20:04178] [10] ./k-string(__gxx_personality_v0+0xb9) [0x404719] [node20:04178] *** End of error message *** mpirun noticed that job rank 0 with PID 4152 on node node19 exited on signal 15 (Terminated).
Re: [OMPI users] "casual" error
Many thanks for your help, it was not clear to me whether it was opal, my application or the standard C libs that were causing the segfault. It is already good news that the problem is not at the level of OpenMPI, since this would have meant upgrading that library. My first reaction would be to say that there is nothing wrong with my code (which has already passed the valgrind test) and the problem should be in the libc, but I agree with you that this is a very unlikely possibility, especially given that we do some remapping of the memory. Hence, I will give a second look with valgrind and a third with efence, and see if there is some bug that managed to survive the extensive testing that the code has undergone up to now. Thanks again, Biagio George Bosilca wrote: Absolutely :) The last few entries on the stack are from OPAL (one of the Open MPI libraries) that trap the segfault. Everything else indicates where the segfault happened. What I can tell from this stack trace is the following: the problem started in your function wait_thread which called one of the functions from the libstdc++ (based on the C++ naming conventions and the name from the stack _ZNSt13basic_filebufIcSt11char_traitsIcEE4openEPKcSt13_ I guess it was open), which called some undetermined function from the libc ... which segfault. It is pretty strange to segfault in a standard function, they are usually pretty well protected, except if you do something blatantly wrong (such as messing up the memory). I suggest using some memory checker tools such as valgrind to check the memory consistency of your application. george. On Mar 5, 2009, at 17:37 , Biagio Lucini wrote: We have an application that runs for a very long time with 16 processes (the time is order a few months; we do have check points, but this won't be the issue). It has happened twice that it fails with the error message appended below after running undisturbed for 20-25 days. It has happened twice so far. This error is not systematically reproducible, and I believe this is not just because the program is parallel. We use openmpi-1.2.5 as distributed in the RH 5.2-clone Scientific Linux, on which our cluster is based. Is this stack suggesting anything to eyes more trained than main? Many thanks, Biagio Lucini - [node20:04178] *** Process received signal *** [node20:04178] Signal: Segmentation fault (11) [node20:04178] Signal code: Address not mapped (1) [node20:04178] Failing at address: 0x2aaadb8b31a0 [node20:04178] [ 0] /lib64/libpthread.so.0 [0x2b5d9c3ebe80] [node20:04178] [ 1] /usr/lib64/openmpi/1.2.5-gcc/lib/libopen-pal.so.0(_int_malloc+0x1d4) [0x2b5d9ccb2 f84] [node20:04178] [ 2] /usr/lib64/openmpi/1.2.5-gcc/lib/libopen-pal.so.0(malloc+0x93) [0x2b5d9ccb4d93] [node20:04178] [ 3] /lib64/libc.so.6 [0x2b5d9d77729a] [node20:04178] [ 4] /usr/lib64/libstdc++.so.6(_ZNSt12__basic_fileIcE4openEPKcSt13_Ios_Openmodei+0x54) [0x2b5d9bf05cb4] [node20:04178] [ 5] /usr/lib64/libstdc++.so.6(_ZNSt13basic_filebufIcSt11char_traitsIcEE4openEPKcSt13_ Ios_Openmode+0x83) [0x2b5d9beb45c3] [node20:04178] [ 6] ./k-string(wait_thread_+0x2a1) [0x42e101] [node20:04178] [ 7] ./k-string(MAIN__+0x2a72) [0x4212d2] [node20:04178] [ 8] ./k-string(main+0xe) [0x42e2ce] [node20:04178] [ 9] /lib64/libc.so.6(__libc_start_main+0xf4) [0x2b5d9d7338b4] [node20:04178] [10] ./k-string(__gxx_personality_v0+0xb9) [0x404719] [node20:04178] *** End of error message *** mpirun noticed that job rank 0 with PID 4152 on node node19 exited on signal 15 (Terminated). ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users