[OMPI users] Problem with openmpi and infiniband

2008-12-23 Thread Biagio Lucini

Hello,

I am new to this list, where I hope to find a solution for a problem 
that I have been having for quite a longtime.


I run various versions of openmpi (from 1.1.2 to 1.2.8) on a cluster 
with Infiniband interconnects that I use and administer at the same 
time. The openfabric stac is OFED-1.2.5, the compilers gcc 4.2 and 
Intel. The queue manager is SGE 6.0u8.


The trouble is with an MPI code that runs fine with an openmpi 1.1.2 
library compiled without infiniband support (I have tested the 
scalability of the code up to 64 cores, the nodes are 4 or 8 cores, the 
results are exactly what I expect), but if I try to use a version 
compiled for infiniband, then only a subset of comunications (the ones 
connecting cores in the same node) are enabled, and because of this the 
program fails (gets stuck in a perennial waiting phase, in particular). 
This happens with any combination of compilers/library releases (1.1.2, 
1.2.7, 1.2.8) I have tried. On other codes, and in particular on 
benchmarks downloaded from the net, openmpi over infiniband seems to 
work (I compared the latency with the tcp btl, so I am pretty sure that 
infiniband works). The two variables I kept fixed are SGE and the OFED 
module stack. I would like not to touch them, if possible, because the 
cluster seems to run fine for other purposes.


My question is: does anyone has a suggestion on what I could try next?
I'm pretty sure that to get an answer I need to provide more details, 
which I am willing to do, but in more than two months of 
testing/trying/hoping/praying I have accumulated so much material and 
information that if I post everything in this e-mail I am likely to 
confuse a potential helper, more than helping him to understand the problem.


Thank you in advance,
Biagio Lucini

--
=

Dr. Biagio Lucini
Department of Physics, Swansea University
Singleton Park, SA2 8PP Swansea (UK)
Tel. +44 (0)1792 602284

=


Re: [OMPI users] Problem with openmpi and infiniband

2008-12-23 Thread Biagio Lucini

Hi Dorian,

thank you for your message.

doriankrause wrote:



The trouble is with an MPI code that runs fine with an openmpi 1.1.2 
library compiled without infiniband support (I have tested the 
scalability of the code up to 64 cores, the nodes are 4 or 8 cores, 
the results are exactly what I expect), but if I try to use a version 
compiled for infiniband, then only a subset of comunications (the ones 
connecting cores in the same node) are enabled, and because of this 
the program fails (gets stuck in a perennial waiting phase, in 
particular). This happens with any combination of compilers/library 
releases (1.1.2, 1.2.7, 1.2.8) I have tried. On other codes, and in 
particular on benchmarks downloaded from the net, openmpi over 
infiniband seems to work (I compared the latency with the tcp btl, so 
I am pretty sure that infiniband works). The two variables I kept 
fixed are SGE and the OFED module stack. I would like not to touch 
them, if possible, because the cluster seems to run fine for other 
purposes.


Does the problem only show up with openmpi? Did you tried to use mvapich 
(http://mvapich.cse.ohio-state.edu/) to test whether it is a hardware or 
software problem? (I don't know any other open-source MPI implementation 
which supports infiniband)




I have had bad experiences with mpich, on which mvapich is based. The 
short answer to your question is yes, and it did not work for other 
reasons (not even over ethernet). The interesting development today is 
that Intel MPI (which should be more or less mvapich2 if I am not wrong) 
seems to work (I will verify this also with mvapich2). This seems to 
point towards a problem with the OpenMPI libraries, but I have 
reservations: they seem to work for even complicated benchmarking tests 
(like the Intel Benchmark) AND I have troubles also with mpich, which I 
did not sort out. A possibility is that the problem is generated by the 
interaction MPI-SGE-my code. I would love if someone more experienced 
than me would give a look at the code (which unfortunately is fortran). 
I will try to trim down the over 4000 lines to a manageable proof of 
concept, if anyone is interested in following this up, but it is 
unlikely to happen before new year :-)


Thanks again,
Biagio


Re: [OMPI users] Problem with openmpi and infiniband

2008-12-24 Thread Biagio Lucini

Pavel Shamis (Pasha) wrote:

Biagio Lucini wrote:

Hello,

I am new to this list, where I hope to find a solution for a problem
that I have been having for quite a longtime.

I run various versions of openmpi (from 1.1.2 to 1.2.8) on a cluster
with Infiniband interconnects that I use and administer at the same
time. The openfabric stac is OFED-1.2.5, the compilers gcc 4.2 and
Intel. The queue manager is SGE 6.0u8.

Do you use OpenMPI version that is included in OFED ? Did you was able
to run basic OFED/OMPI tests/benchmarks between two nodes ?



Hi,

yes to both questions: the OMPI version is the one that comes with OFED 
(1.1.2-1) and the basic tests run fine. For instance, IMB-MPI1 (which is 
more than basic, as far as I can see) reports for the last test:


#---
# Benchmarking Barrier
# #processes = 6
#---
 #repetitions  t_min[usec]  t_max[usec]  t_avg[usec]
 100022.9322.9522.94


for the openib,self btl (6 processes, all processes on different nodes)
and

#---
# Benchmarking Barrier
# #processes = 6
#---
 #repetitions  t_min[usec]  t_max[usec]  t_avg[usec]
 1000   191.30   191.42   191.34

for the tcp,self btl (same test)

No anomalies for other tests (ping-pong, all-to-all etc.)

Thanks,
Biagio


--
=

Dr. Biagio Lucini   
Department of Physics, Swansea University
Singleton Park, SA2 8PP Swansea (UK)
Tel. +44 (0)1792 602284

=


Re: [OMPI users] Problem with openmpi and infiniband

2008-12-27 Thread Biagio Lucini

Tim Mattox wrote:

For your runs with Open MPI over InfiniBand, try using openib,sm,self
for the BTL setting, so that shared memory communications are used
within a node.  It would give us another datapoint to help diagnose
the problem.  As for other things we would need to help diagnose the
problem, please follow the advice on this FAQ entry, and the help page:
http://www.open-mpi.org/faq/?category=openfabrics#ofa-troubleshoot
http://www.open-mpi.org/community/help/
  

Dear Tim,

thank you for this pointer.

1) Ofed: It's 1.2.5, from the OpenFabrics website
2) Linux version: scientific linux (RH enterprise remaster) v. 4.2, 
kernel 2.6.9-55.0.12.ELsmp

3) Subnet manager: OpenSM
4)ibv_devinfo
hca_id:mthca0
   fw_ver:1.0.800
   node_guid:0002:c902:0022:b398
   sys_image_guid:0002:c902:0022:b39b
   vendor_id:0x02c9
   vendor_part_id:25204
   hw_ver:0xA0
   board_id:MT_03B0120002
   phys_port_cnt:1
   port:1
   state:PORT_ACTIVE (4)
   max_mtu:2048 (4)
   active_mtu:2048 (4)
   sm_lid:9
   port_lid:97
   port_lmc:0x00

(no node is different from the others, as far as the problem is concerned)

5) ifconfig:
eth0  Link encap:Ethernet  HWaddr 00:17:31:E3:89:4A 
 inet addr:10.0.0.12  Bcast:10.0.0.255  Mask:255.255.255.0

 inet6 addr: fe80::217:31ff:fee3:894a/64 Scope:Link
 UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
 RX packets:23348585 errors:0 dropped:0 overruns:0 frame:0
 TX packets:17247486 errors:0 dropped:0 overruns:0 carrier:0
 collisions:0 txqueuelen:1000
 RX bytes:19410724189 (18.0 GiB)  TX bytes:14981325997 (13.9 GiB)
 Interrupt:209

loLink encap:Local Loopback 
 inet addr:127.0.0.1  Mask:255.0.0.0

 inet6 addr: ::1/128 Scope:Host
 UP LOOPBACK RUNNING  MTU:16436  Metric:1
 RX packets:5088 errors:0 dropped:0 overruns:0 frame:0
 TX packets:5088 errors:0 dropped:0 overruns:0 carrier:0
 collisions:0 txqueuelen:0
 RX bytes:2468843 (2.3 MiB)  TX bytes:2468843 (2.3 MiB)

6) ulimit -l
8388608
(this is more than the physical memory on the node)

7) output of ompi_info attached (I have tried also earlier releases)

8) description of the problem: a program seems to communicate correctly 
over the TCP network, but not over the ifiniband network. The program is 
structured in such a way that if the communication does not happen, a 
loop become infinite. So there is no error message, just a program 
entering an infinite loop.


The command line used are:

The command line I use is

mpirun -mca btl  openib,sm,self  

(with openib replaced by tcp in the case of communication over ethernet).

I could include the path and the value of the variable LD_LIBRARY_PATH, 
but it won't tell too much, since the installation directory is 
non-standard (/opt/ompi128-intel/bin for the path and 
/opt/ompi128-intel/lib for the libs).


I hope to have provided all the required info, if you need more or some 
of them in more detail, please let me know.


Many thanks,
Biagio Lucini
Open MPI: 1.2.8
   Open MPI SVN revision: r19718
Open RTE: 1.2.8
   Open RTE SVN revision: r19718
OPAL: 1.2.8
   OPAL SVN revision: r19718
  Prefix: /opt/ompi128-intel
 Configured architecture: x86_64-unknown-linux-gnu
   Configured by: root
   Configured on: Tue Dec 23 12:33:51 GMT 2008
  Configure host: master.cluster
Built by: root
Built on: Tue Dec 23 12:38:34 GMT 2008
  Built host: master.cluster
  C bindings: yes
C++ bindings: yes
  Fortran77 bindings: yes (all)
  Fortran90 bindings: yes
 Fortran90 bindings size: small
  C compiler: icc
 C compiler absolute: /opt/intel/cce/9.1.045/bin/icc
C++ compiler: icpc
   C++ compiler absolute: /opt/intel/cce/9.1.045/bin/icpc
  Fortran77 compiler: ifort
  Fortran77 compiler abs: /opt/intel/fce/9.1.040/bin/ifort
  Fortran90 compiler: ifort
  Fortran90 compiler abs: /opt/intel/fce/9.1.040/bin/ifort
 C profiling: yes
   C++ profiling: yes
 Fortran77 profiling: yes
 Fortran90 profiling: yes
  C++ exceptions: no
  Thread support: posix (mpi: no, progress: no)
  Internal debug support: no
 MPI parameter check: runtime
Memory profiling support: no
Memory debugging support: no
 libltdl support: yes
   Heterogeneous support: yes
 mpirun default --prefix: no
   MCA backtrace: execinfo (MCA v1.0, API v1.0, Component v1.2.8)
  MCA memory: ptmalloc2 (MCA v1.0, API v1.0, Component v1.2.8)
   MCA paffinity: linux (MCA v1.0, API v1.0, Component v1.2.8)
   MCA maffinity:

Re: [OMPI users] Problem with openmpi and infiniband

2008-12-27 Thread Biagio Lucini

Jeff Squyres wrote:
Another thing to try is a change that we made late in the Open MPI 
v1.2 series with regards to IB:



http://www.open-mpi.org/faq/?category=openfabrics#v1.2-use-early-completion 



Thanks, this is something worth investigating. What would be the exact 
syntax to use to turn off pml_ob1_use_early_completion? Do you think the 
same problem can also happen in the 1.1(.2) release, which is the one I 
have also tested, since it comes with Ofed 1.2.5? Would it be worth to 
try the 1.3? So far I have avoided it since it is tagged as "prerelease".


Thanks,
Biagio




Re: [OMPI users] openMPI, transfer data from multiple sources to one destination

2008-12-29 Thread Biagio Lucini

Jack Bryan wrote:

HI,

I need to transfer data from multiple sources to one destination.

[...]

Probably it is not the best solution, but what I did was the following:

a) the receiver listen for transmitters ready to send data with 
MPI_IRECV. The message overwrites a logical array, which is initialised 
to false. If some transmitter is ready, then the corresponding entry in 
the array is updated to true
b) when ready, a transmitter send a true (with MPI_SEND) to the receiver 
and open a communication channel for the data, with another call to MPI_SEND
c) after having checked for availability of all the transmitters, the 
receiver cycles over the transmitters that are ready to communicate 
(entry of the logical array equal to true) and open a communication 
channel in blocking mode (MPI_RECV) with each of them, in turn
d) the transmitter reinitialise the logical array to false and goes back 
to (a) above


This implementation assumes that you do not need the data in any 
particular order.


Hope it works for you.

Biagio

--
=

Dr. Biagio Lucini   
Department of Physics, Swansea University
Singleton Park, SA2 8PP Swansea (UK)
Tel. +44 (0)1792 602284

=


Re: [OMPI users] Problem with openmpi and infiniband

2008-12-29 Thread Biagio Lucini

Pavel Shamis (Pasha) wrote:


Your problem definitely maybe related to the know issue with early
completions. The exact syntax is:|
--mca pml_ob1_use_early_completion 0|



Thanks,

I am currently looking for the first available spot on the cluster, then 
I will try this. I'll let you know.


Biagio

--
=

Dr. Biagio Lucini   
Department of Physics, Swansea University
Singleton Park, SA2 8PP Swansea (UK)
Tel. +44 (0)1792 602284

=


Re: [OMPI users] Problem with openmpi and infiniband

2009-01-02 Thread Biagio Lucini

Pavel Shamis (Pasha) wrote:


Another thing to try is a change that we made late in the Open MPI 
v1.2 series with regards to IB:



http://www.open-mpi.org/faq/?category=openfabrics#v1.2-use-early-completion 



Thanks, this is something worth investigating. What would be the 
exact syntax to use to turn off pml_ob1_use_early_completion? 
Your problem definitely maybe related to the know issue with early 
completions. The exact syntax is:|

--mca pml_ob1_use_early_completion 0|

Unfortunately this did not help: still the same problem. Here is the 
script I run: last line for the tcp test, previous line for the openib 
test.

--
#!/bin/bash
#$ -S /bin/bash

#Set out, error and job name
#$ -o run2.out
#$ -e run2.err
#$ -N su3_01Jan

#Number of nodes for mpi (18 in this case)
#$ -pe make 38

# The batchsystem should use the current directory as working directory.
#$ -cwd


export 
LD_LIBRARY_PATH=/opt/numactl-0.6.4/:/opt/sge-6.0u8/lib/lx24-amd64:/opt/ompi128-intel/lib


echo LD_LIBRARY_PATH  $LD_LIBRARY_PATH
ldd ./k-string

ulimit -l 8388608
ulimit -a

export PATH=$PATH:/opt/ompi128-intel/bin
which mpirun

#The actual mpirun command
#mpirun -np $NSLOTS -mca btl openib,sm,self --mca 
pml_ob1_use_early_completion 0 ./k-string

mpirun -np $NSLOTS -mca btl tcp,sm,self ./k-string

---

This also contains extra diagnostic for the path, library path, memory 
locked etc. All seems ok, and as before the tcp run goes well, the 
openib run has communication problem (it looks like no communication 
channel can be open or recognised). I will try OMPI1.3 rc2 (as it has 
been suggested), failing that I will try to isolate a test case, to see 
if the problem can be reproduced on other systems. Meanwhile, I'm happy 
to listen to any suggestion you might have.


Thanks,
Biagio


Re: [OMPI users] Problem with openmpi and infiniband

2009-01-07 Thread Biagio Lucini
The test was in fact ok, I have also verified it on 30 processors. 
Meanwhile I tried OMPI1.3RC2, with which the application fails on 
infiniband, I hope this will give some clue (or at least be useful to 
finalise the release of OpenMPI 1.3). I remind the mailing list that I 
use the OFED 1.2.5 release. The only change with respect the last time 
is the use of OMPI1.3RC2 instead of 1.2.8. To avoid boring the mailing 
list, I don't repeat details I have already provided (like the command 
line parameters) on which we seem to have agreed that there is no 
problem. However, if you want to know more, please ask.


The error file as produced by SGE is attached.

Thanks,
Biagio

Lenny Verkhovsky wrote:

Hi,  just to make sure,

you wrote in the previous mail that you tested IMB-MPI1 and it
"reports for the last test" , and the results are for
"processes=6", since you have 4 and 8 core machines, this test could
be run on the same 8 core machine over shared memory and not over
Infiniband, as you suspected.

You can rerun the IMB-MPI1 test with -mca btl self,openib to be sure
that the test does not use shared memory or tcp.

Lenny.



On 12/24/08, Biagio Lucini  wrote:
  

Pavel Shamis (Pasha) wrote:



Biagio Lucini wrote:

  

Hello,

I am new to this list, where I hope to find a solution for a problem
that I have been having for quite a longtime.

I run various versions of openmpi (from 1.1.2 to 1.2.8) on a cluster
with Infiniband interconnects that I use and administer at the same
time. The openfabric stac is OFED-1.2.5, the compilers gcc 4.2 and
Intel. The queue manager is SGE 6.0u8.



Do you use OpenMPI version that is included in OFED ? Did you was able
to run basic OFED/OMPI tests/benchmarks between two nodes ?


  

 Hi,

 yes to both questions: the OMPI version is the one that comes with OFED
(1.1.2-1) and the basic tests run fine. For instance, IMB-MPI1 (which is
more than basic, as far as I can see) reports for the last test:

 #---
 # Benchmarking Barrier
 # #processes = 6
 #---
  #repetitions  t_min[usec]  t_max[usec]  t_avg[usec]
 100022.9322.9522.94


 for the openib,self btl (6 processes, all processes on different nodes)
 and

 #---
 # Benchmarking Barrier
 # #processes = 6
 #---
  #repetitions  t_min[usec]  t_max[usec]  t_avg[usec]
 1000   191.30   191.42   191.34

 for the tcp,self btl (same test)

 No anomalies for other tests (ping-pong, all-to-all etc.)

 Thanks,
 Biagio


 --
 =========

 Dr. Biagio Lucini
 Department of Physics, Swansea University
 Singleton Park, SA2 8PP Swansea (UK)
 Tel. +44 (0)1792 602284

 =
 ___
 users mailing list
 us...@open-mpi.org
 http://www.open-mpi.org/mailman/listinfo.cgi/users



___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
  


[[5963,1],13][btl_openib_component.c:2893:handle_wc] from node24 to: node11 
error polling LP CQ with status RECEIVER NOT READY RETRY EXCEEDED ERROR status 
number 13 for wr_id 37779456 opcode 0 qp_idx 0
[[5963,1],12][btl_openib_component.c:2893:handle_wc] from node23 to: node11 
error polling LP CQ with status RECEIVER NOT READY RETRY EXCEEDED ERROR status 
number 13 for wr_id 37779456 opcode 0 qp_idx 0
[[5963,1],8][btl_openib_component.c:2893:handle_wc] from node9 to: node11 error 
polling LP CQ with status RECEIVER NOT READY RETRY EXCEEDED ERROR status number 
13 for wr_id 37779456 opcode 0 qp_idx 0
[[5963,1],11][btl_openib_component.c:2893:handle_wc] from node20 to: node11 
error polling LP CQ with status RECEIVER NOT READY RETRY EXCEEDED ERROR status 
number 13 for wr_id 37779456 opcode 0 qp_idx 0
[[5963,1],9][btl_openib_component.c:2893:handle_wc] from node18 to: node11 
error polling LP CQ with status RECEIVER NOT READY RETRY EXCEEDED ERROR status 
number 13 for wr_id 37779456 opcode 0 qp_idx 0
[[5963,1],4][btl_openib_component.c:2893:handle_wc] from node13 to: node11 
error polling LP CQ with status RECEIVER NOT READY RETRY EXCEEDED ERROR status 
number 13 for wr_id 37779456 opcode 0 qp_idx 0
[[5963,1],3][btl_openib_component.c:2893:handle_wc] from node12 to: node11 
error polling LP CQ with status RECEIVER NOT READY RETRY EXCEEDED ERROR status 
number 13 for wr_id 37779456 opcode 0 qp_idx 0
[[5963,1],6][btl_openib_component.c:2893:handle_wc] from node15 to: node11 
error polling LP CQ with status RECEIVER NOT READY RETRY EXCEEDED ERROR status 
number 13 for wr_id 37779456 opcode 0 qp_idx 0
[[5963,1],1][btl_openib_component.c:2893:handle_wc] from node10 to: node11 
error polling LP CQ with status

Re: [OMPI users] Problem with openmpi and infiniband

2009-01-15 Thread Biagio Lucini

Jeff Squyres wrote:

On Jan 7, 2009, at 6:28 PM, Biagio Lucini wrote:


[[5963,1],13][btl_openib_component.c:2893:handle_wc] from node24 to:
node11 error polling LP CQ with status RECEIVER NOT READY RETRY
EXCEEDED ERROR status number 13 for wr_id 37779456 opcode 0 qp_idx 0


Ah! If we're dealing a RNR retry exceeded, this is *usually* a physical
layer problem on the IB fabric.

Have you run a complete layer 0 / physical set of diagnostics on the
fabric to know that it is completely working properly?



Once again, apologies for the delayed answer, but I always need to find 
a free spot to perform checks without disrupting the activity of the 
other users, who seem to be happy with the present status (this includes 
the other users of infiniband).


What I have done is to run the Intel MPI Benchmark in a stress-mode over 
40 nodes and then on exactly the same nodes my code. The errors for my 
code are attached. I do not attach the Intel benchmark file, since it is 
100k and might upset someone, but I can send it on request. If I pick a 
random test:


#-
# Benchmarking Exchange 

# #processes = 40 


#-
   #bytes #repetitions  t_min[usec]  t_max[usec]  t_avg[usec] 
Mbytes/sec
0 100019.7020.3719.87 
   0.00
1 100012.8013.6113.25 
   0.28
2 100012.9413.7313.39 
   0.56
4 100012.9313.2413.14 
   1.15
8 100012.4612.8912.65 
   2.37
   16 100014.5915.3515.00 
   3.98
   32 100012.8313.4213.26 
   9.09
   64 100013.1713.4913.31 
  18.10
  128 100013.8314.4014.20 
  33.90
  256 100016.4717.3416.89 
  56.33
  512 100022.7223.2922.99 
  83.85
 1024 100035.0936.3035.72 
 107.62
 2048 100071.2872.4671.91 
 107.81
 4096 1000   139.78   141.55   140.72 
 110.38
 8192 1000   237.86   240.13   239.10 
 130.14
16384 1000   481.37   486.15   484.10 
 128.56
32768 1000   864.89   872.48   869.35 
 143.27
65536  640  1607.97  1629.53  1620.19 
 153.42
   131072  320  3106.92  3196.91  3160.10 
 156.40
   262144  160  5970.66  6333.02  6185.35 
 157.90
   524288   80 16322.10 18509.40 17627.17 
 108.05
  1048576   40 31194.17 40981.73 37056.97 
  97.60
  2097152   20 38023.90 77308.80 61021.08 
 103.48
  4194304   10 20423.82143447.80 84832.93 
 111.54

--


As you can see, the Intel benchmark runs fine on this set
of nodes; I have been running it for a few hours without any problem. On 
the other hands, my job still has this problem. To recap:

both are compiled with openmpi, the benchmark looks fine and my job
refuses to establish communication among processes without giving any 
error message with OMPI 1.2.x (various x) while gives the attached error 
message with 1.3rc2.


I have tried ibcheckerrors, which reports:

#warn: counter SymbolErrors = 65535 (threshold 10)
#warn: counter LinkDowned = 20  (threshold 10)
#warn: counter XmtDiscards = 65535  (threshold 100)
Error check on lid 1 (MT47396 Infiniscale-III Mellanox Technologies) 
port all:  FAILED

#warn: counter SymbolErrors = 65535 (threshold 10)
Error check on lid 1 (MT47396 Infiniscale-III Mellanox Technologies) 
port 10:  FAILED

# Checked Switch: nodeguid 0x000b8c002347 with failure
#warn: counter XmtDiscards = 65535  (threshold 100)
Error check on lid 1 (MT47396 Infiniscale-III Mellanox Technologies) 
port 1:  FAILED


## Summary: 25 nodes checked, 0 bad nodes found
##  48 ports checked, 2 ports have errors beyond threshold

Admittedly, not encouraging. The output of ibnetdiscover is attached.

I should had that the cluster (including infiniband) is currently being 
used. Unfortunately, my experience with infiniband is not adequate to


Any further clue on possible problems is very welcome.

Many thanks for your attention,
Biagio

--
=

Dr. Biagio Lucini   
Department of Physics, Swansea University
Singleton Park, SA2 8PP Swansea (UK)
Tel. +44 (0)1792 602284

=
[n

Re: [OMPI users] openib RETRY EXCEEDED ERROR

2009-02-27 Thread Biagio Lucini

Bogdan Costescu wrote:


Brett Pemberton  wrote:


[[1176,1],0][btl_openib_component.c:2905:handle_wc] from
tango092.vpac.org to: tango090 error polling LP CQ with status RETRY
EXCEEDED ERROR status number 12 for wr_id 38996224 opcode 0 qp_idx 0


I've seen this error with Mellanox ConnectX cards and OFED 1.2.x with
all versions of OpenMPI that I have tried (1.2.x and pre-1.3) and some
MVAPICH versions, from which I have concluded that the problem lies in
the lower levels (OFED or IB card firmware). Indeed after the
installation of OFED 1.3.x and a possible firmware update (not sure
about the firmware as I don't admin that cluster), these errors have
disappeared.



I can confirm this: I had a similar problem over Christmas, for which I 
asked for help in this list. In fact the problem was not with OpenMPI, 
but with the OFED stack: an upgrade of the latter (and an upgrade of the 
firmware, although once again the OFED drivers were complaining about 
the firmware being too old) fixed the problem. We did both upgrades at 
once, so as in Brett's case I am not sure which one played the major role.


Biagio

--
=====

Dr. Biagio Lucini   
Department of Physics, Swansea University
Singleton Park, SA2 8PP Swansea (UK)
Tel. +44 (0)1792 602284

=


[OMPI users] "casual" error

2009-03-05 Thread Biagio Lucini
We have an application that runs for a very long time with 16 processes 
(the time is order a few months; we do have check points, but this won't 
be the issue). It has happened twice that it fails with the error 
message appended below after running undisturbed for 20-25 days. It has 
happened twice so far. This error is not systematically reproducible, 
and I believe this is not just because the program is parallel. We use 
openmpi-1.2.5 as distributed in the RH 5.2-clone Scientific Linux, on 
which our cluster is based. Is this stack suggesting anything to eyes 
more trained than main?


Many thanks,
Biagio Lucini

-

[node20:04178] *** Process received signal ***
[node20:04178] Signal: Segmentation fault (11)
[node20:04178] Signal code: Address not mapped (1)
[node20:04178] Failing at address: 0x2aaadb8b31a0
[node20:04178] [ 0] /lib64/libpthread.so.0 [0x2b5d9c3ebe80]
[node20:04178] [ 1] 
/usr/lib64/openmpi/1.2.5-gcc/lib/libopen-pal.so.0(_int_malloc+0x1d4) 
[0x2b5d9ccb2

f84]
[node20:04178] [ 2] 
/usr/lib64/openmpi/1.2.5-gcc/lib/libopen-pal.so.0(malloc+0x93) 
[0x2b5d9ccb4d93]

[node20:04178] [ 3] /lib64/libc.so.6 [0x2b5d9d77729a]
[node20:04178] [ 4] 
/usr/lib64/libstdc++.so.6(_ZNSt12__basic_fileIcE4openEPKcSt13_Ios_Openmodei+0x54)

[0x2b5d9bf05cb4]
[node20:04178] [ 5] 
/usr/lib64/libstdc++.so.6(_ZNSt13basic_filebufIcSt11char_traitsIcEE4openEPKcSt13_

Ios_Openmode+0x83) [0x2b5d9beb45c3]
[node20:04178] [ 6] ./k-string(wait_thread_+0x2a1) [0x42e101]
[node20:04178] [ 7] ./k-string(MAIN__+0x2a72) [0x4212d2]
[node20:04178] [ 8] ./k-string(main+0xe) [0x42e2ce]
[node20:04178] [ 9] /lib64/libc.so.6(__libc_start_main+0xf4) 
[0x2b5d9d7338b4]

[node20:04178] [10] ./k-string(__gxx_personality_v0+0xb9) [0x404719]
[node20:04178] *** End of error message ***
mpirun noticed that job rank 0 with PID 4152 on node node19 exited on 
signal 15 (Terminated).




Re: [OMPI users] "casual" error

2009-03-05 Thread Biagio Lucini
Many thanks for your help, it was not clear to me whether it was opal, 
my application or the standard C libs that were causing the segfault. It 
is already good news that the problem is not at the level of OpenMPI, 
since this would have meant upgrading that library. My first reaction 
would be to say that there is nothing wrong with my code (which has 
already passed the valgrind test) and the problem should be in the libc, 
but I agree with you that this is a very unlikely possibility, 
especially given that we do some remapping of the memory. Hence, I will 
give a second look with valgrind and a third with efence, and see if 
there is some bug that managed to survive the extensive testing that the 
code has undergone up to now.


Thanks again,
Biagio

George Bosilca wrote:
Absolutely :) The last few entries on the stack are from OPAL (one of 
the Open MPI libraries) that trap the segfault. Everything else 
indicates where the segfault happened. What I can tell from this stack 
trace is the following: the problem started in your function 
wait_thread which called one of the functions from the libstdc++ 
(based on the C++ naming conventions and the name from the stack 
_ZNSt13basic_filebufIcSt11char_traitsIcEE4openEPKcSt13_ I guess it was 
open), which called some undetermined function from the libc ... which 
segfault.


It is pretty strange to segfault in a standard function, they are 
usually pretty well protected, except if you do something blatantly 
wrong (such as messing up the memory). I suggest using some memory 
checker tools such as valgrind to check the memory consistency of your 
application.


  george.

On Mar 5, 2009, at 17:37 , Biagio Lucini wrote:

We have an application that runs for a very long time with 16 
processes (the time is order a few months; we do have check points, 
but this won't be the issue). It has happened twice that it fails 
with the error message appended below after running undisturbed for 
20-25 days. It has happened twice so far. This error is not 
systematically reproducible, and I believe this is not just because 
the program is parallel. We use openmpi-1.2.5 as distributed in the 
RH 5.2-clone Scientific Linux, on which our cluster is based. Is this 
stack suggesting anything to eyes more trained than main?


Many thanks,
Biagio Lucini

- 



[node20:04178] *** Process received signal ***
[node20:04178] Signal: Segmentation fault (11)
[node20:04178] Signal code: Address not mapped (1)
[node20:04178] Failing at address: 0x2aaadb8b31a0
[node20:04178] [ 0] /lib64/libpthread.so.0 [0x2b5d9c3ebe80]
[node20:04178] [ 1] 
/usr/lib64/openmpi/1.2.5-gcc/lib/libopen-pal.so.0(_int_malloc+0x1d4) 
[0x2b5d9ccb2

f84]
[node20:04178] [ 2] 
/usr/lib64/openmpi/1.2.5-gcc/lib/libopen-pal.so.0(malloc+0x93) 
[0x2b5d9ccb4d93]

[node20:04178] [ 3] /lib64/libc.so.6 [0x2b5d9d77729a]
[node20:04178] [ 4] 
/usr/lib64/libstdc++.so.6(_ZNSt12__basic_fileIcE4openEPKcSt13_Ios_Openmodei+0x54) 


[0x2b5d9bf05cb4]
[node20:04178] [ 5] 
/usr/lib64/libstdc++.so.6(_ZNSt13basic_filebufIcSt11char_traitsIcEE4openEPKcSt13_ 


Ios_Openmode+0x83) [0x2b5d9beb45c3]
[node20:04178] [ 6] ./k-string(wait_thread_+0x2a1) [0x42e101]
[node20:04178] [ 7] ./k-string(MAIN__+0x2a72) [0x4212d2]
[node20:04178] [ 8] ./k-string(main+0xe) [0x42e2ce]
[node20:04178] [ 9] /lib64/libc.so.6(__libc_start_main+0xf4) 
[0x2b5d9d7338b4]

[node20:04178] [10] ./k-string(__gxx_personality_v0+0xb9) [0x404719]
[node20:04178] *** End of error message ***
mpirun noticed that job rank 0 with PID 4152 on node node19 exited on 
signal 15 (Terminated).


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users