Re: [OMPI users] openib RETRY EXCEEDED ERROR

2009-02-27 Thread Bogdan Costescu
Brett Pemberton wrote: [[1176,1],0][btl_openib_component.c:2905:handle_wc] from tango092.vpac.org to: tango090 error polling LP CQ with status RETRY EXCEEDED ERROR status number 12 for wr_id 38996224 opcode 0 qp_idx 0 I've seen this error with Mellanox ConnectX cards and OFED 1.2.x with al

Re: [OMPI users] openib RETRY EXCEEDED ERROR

2009-02-27 Thread Biagio Lucini
Bogdan Costescu wrote: Brett Pemberton wrote: [[1176,1],0][btl_openib_component.c:2905:handle_wc] from tango092.vpac.org to: tango090 error polling LP CQ with status RETRY EXCEEDED ERROR status number 12 for wr_id 38996224 opcode 0 qp_idx 0 I've seen this error with Mellanox ConnectX cards

[OMPI users] libmpi_f90.so not being built

2009-02-27 Thread Tiago Silva
Hi, I am trying to build openmpi 1.3 on Cent_OS with gcc and the lahey f95 compiler with the following configuration: ./configure F77=/share/apps/lf6481/bin/lfc FC=/share/apps/lf6481/bin/lfc --prefix=/opt/openmpi-1.3_lfc When I "make install all" the process fails to build libmpi_f90.la

[OMPI users] Problem with cascading derived data types

2009-02-27 Thread Markus Blatt
Hi, In one of my applications I am using cascaded derived MPI datatypes created with MPI_Type_struct. One of these types is used to just send a part (one MPI_Char) of a struct consisting of an int followed by two chars. I.e, the int at the beginning is/should be ignored. This works fine if I use

Re: [OMPI users] libmpi_f90.so not being built

2009-02-27 Thread Jeff Squyres
Can you please send all the information listed here: http://www.open-mpi.org/community/help/ On Feb 27, 2009, at 6:38 AM, Tiago Silva wrote: Hi, I am trying to build openmpi 1.3 on Cent_OS with gcc and the lahey f95 compiler with the following configuration: ./configure F77=/share/a

Re: [OMPI users] libmpi_f90.so not being built

2009-02-27 Thread Tiago Silva
ok, here is the complete output in the tgz file attached. The output is slightly different as I am now only using "make all" and not installing. I did a full "make clean" and "rm -fr /*" and the already exists but is empty. Thanks ts-output.tgz Description: Binary data

[OMPI users] more XGrid Problems with openmpi1.2.9

2009-02-27 Thread Ricardo Fernández-Perea
Hi It seems to me more like time issues. All the runs end with something similar to Exception Type: EXC_BAD_ACCESS (SIGSEGV) Exception Codes: KERN_INVALID_ADDRESS at 0x45485308 Crashed Thread: 0 Thread 0 Crashed: 0 libSystem.B.dylib 0x95208f04 strcmp + 84 1 libopen-rte.

[OMPI users] Fwd: more XGrid Problems with openmpi1.2.9 (error find)

2009-02-27 Thread Ricardo Fernández-Perea
Find the problem in orte_pls_xgrid_terminate_orteds orte_pls_base_get_active_daemons is been call as orte_pls_base_get_active_daemons(&daemons, jobid) when the correct way of doing it is orte_pls_base_get_active_daemons(&daemons, jobid, attrs) yours. Ricardo Hi It seems to me more like time

Re: [OMPI users] Latest SVN failures

2009-02-27 Thread Rolf Vandevaart
I just tried trunk-1.4a1r20458 and I did not see this error, although my configuration was rather different. I ran across 100 2-CPU sparc nodes, np=256, connected with TCP. Hopefully George's comment helps out with this issue. One other thought to see whether SGE has anything to do with thi

Re: [OMPI users] 3.5 seconds before application launches

2009-02-27 Thread Vittorio Giovara
Hello, and thanks for both replies, I've tried to run non-mpi program but i still measured some latency time before starting, something around 2 seconds this time. SSH should be properly configured, in fact i can login to both machines without password; openmpi and mvapich use ssh as default. i'v

[OMPI users] TCP instead of openIB doesn't work

2009-02-27 Thread Vittorio Giovara
Hello, i'm posting here another problem of my installation I wanted to benchmark the differences between tcp and openib transport if i run a simple non mpi application i get randori ~ # mpirun --mca btl tcp,self -np 2 -host randori -host tatami hostname randori tatami but as soon as i switch to

Re: [OMPI users] TCP instead of openIB doesn't work

2009-02-27 Thread Ralph Castain
I'm not entirely sure what is causing the problem here, but one thing does stand out. You have specified two -host options for the same application - this is not our normal syntax. The usual way of specifying this would be: mpirun --mca btl tcp,self -np 2 -host randori,tatami hostname I'

Re: [OMPI users] openib RETRY EXCEEDED ERROR

2009-02-27 Thread Matt Hughes
2009/2/26 Brett Pemberton : > [[1176,1],0][btl_openib_component.c:2905:handle_wc] from tango092.vpac.org > to: tango090 error polling LP CQ with status RETRY EXCEEDED ERROR status > number 12 for wr_id 38996224 opcode 0 qp_idx 0 What OS are you using? I've seen this error and many other Infiniban

Re: [OMPI users] openib RETRY EXCEEDED ERROR

2009-02-27 Thread Åke Sandgren
On Fri, 2009-02-27 at 09:54 -0700, Matt Hughes wrote: > 2009/2/26 Brett Pemberton : > > [[1176,1],0][btl_openib_component.c:2905:handle_wc] from tango092.vpac.org > > to: tango090 error polling LP CQ with status RETRY EXCEEDED ERROR status > > number 12 for wr_id 38996224 opcode 0 qp_idx 0 > > Wha

Re: [OMPI users] openib RETRY EXCEEDED ERROR

2009-02-27 Thread Pavel Shamis (Pasha)
Usually "retry exceeded error" points to some network issues, like bad cable or some bad connector. You may use ibdiagnet tool for the network debug - *http://linux.die.net/man/1/ibdiagnet. *This tool is part of OFED. Pasha Brett Pemberton wrote: Hey, I've had a couple of errors recently, of

[OMPI users] defining different values for same environment variable

2009-02-27 Thread Nicolas Deladerriere
Hello I am looking for a way to set environment variable with different value on each node before running MPI executable. (not only export the environment variable !) Let's consider that I have cluster with two nodes (n001 and n002) and I want to set the environment variable GMON_OUT_PREFIX with d

Re: [OMPI users] defining different values for same environment variable

2009-02-27 Thread Matt Hughes
2009/2/27 Nicolas Deladerriere : > I am looking for a way to set environment variable with different value on > each node before running MPI executable. (not only export the environment > variable !) I typically use a script for things like this. So instead of specifying your executable directly

[OMPI users] Threading fault

2009-02-27 Thread Mahmoud Payami
Dear All, I am using intel lc_prof-11 (and its own mkl) and have built openmpi-1.3.1 with connfigure options: "FC=ifort F77=ifort CC=icc CXX=icpc". Then I have built my application. The linux box is 2Xamd64 quad. In the middle of running of my application (after some 15 iterations), I receive the

Re: [OMPI users] valgrind problems

2009-02-27 Thread Douglas Guptill
On Thu, Feb 26, 2009 at 08:27:15PM -0700, Justin wrote: > Also the stable version of openmpi on Debian is 1.2.7rc2. Are there any > known issues with this version and valgrid? For a now-forgotten reason, I ditched the openmpi that comes on Debian etch, and installed 1.2.8 in /usr/local. HTH, Do

[OMPI users] threading bug?

2009-02-27 Thread Mahmoud Payami
Dear All, I am using intel lc_prof-11 (and its own mkl) and have built openmpi-1.3.1 with connfigure options: "FC=ifort F77=ifort CC=icc CXX=icpc". Then I have built my application. The linux box is 2Xamd64 quad. In the middle of running of my application (after some 15 iterations), I receive the

Re: [OMPI users] defining different values for same environment variable

2009-02-27 Thread Nicolas Deladerriere
Matt, Thanks for your solution, but I thought about that and it is not really convenient in my configuration to change the executable on each node. I would like to change only mpirun command. 2009/2/27 Matt Hughes > > 2009/2/27 Nicolas Deladerriere : > > I am looking for a way to set environm

Re: [OMPI users] OMPI, and HPUX

2009-02-27 Thread Jeff Squyres
I don't know if anyone has tried OMPI on HP-UX, sorry. On Feb 26, 2009, at 9:14 AM, Nader wrote: Hello, Does anyone has installed OMPI on a HPUX system? I do apprciate any info. Best Regards. Nader ___ users mailing list us...@open-mpi.org http:/

Re: [OMPI users] Latest SVN failures

2009-02-27 Thread Rolf Vandevaart
With further investigation, I have reproduced this problem. I think I was originally testing against a version that was not recent enough. I do not see it with r20594 which is from February 19. So, something must have happened over the last 8 days. I will try and narrow down the issue. Rol

Re: [OMPI users] openib RETRY EXCEEDED ERROR

2009-02-27 Thread Jeff Squyres
On Feb 27, 2009, at 12:09 PM, Åke Sandgren wrote: We see these errors fairly frequently on our CentOS 5.2 system with Mellanox InfiniHost III cards. The OFED stack is whatever the CentOS5.2 uses. Has anyone tested that with the 1.4 OFED stack? FWIW, I have tested OMPI's openib BTL with sev

Re: [OMPI users] TCP instead of openIB doesn't work

2009-02-27 Thread Vittorio Giovara
Hello, i ve corrected the syntax and added the flag you suggested, but unfortunately the result doen't change. randori ~ # mpirun --display-map --mca btl tcp,self -np 2 -host randori,tatami graph [randori:22322] Map for job: 1Generated by mapping mode: byslot Starting vpid: 0Vpid ra

Re: [OMPI users] Latest SVN failures

2009-02-27 Thread Jeff Squyres
Unfortunately, I think I have reproduced the problem as well -- with SVN trunk HEAD (r20655): [15:12] svbu-mpi:~/mpi % mpirun --mca bogus foo --bynode -np 2 uptime [svbu-mpi.cisco.com:24112] [[62779,0],0] ORTE_ERROR_LOG: Data unpack failed in file base/odls_base_default_fns.c at line 566 ---

Re: [OMPI users] TCP instead of openIB doesn't work

2009-02-27 Thread Jeff Squyres
I notice the following: - you're creating an *enormous* array on the stack. you might be better allocating it on the heap. - the value of "exchanged" will quickly grow beyond 2^31 (i.e., MAX_INT) which is the max that the MPI API can handle. Bad Things can/ will happen beyond that value (i