[OMPI users] mlx4 error - looking for guidance

2009-03-04 Thread Jeff Layton
Evening everyone, I'm running a CFD code on IB and I've encountered an error I'm not sure about and I'm looking for some guidance on where to start looking. Here's the error: mlx4: local QP operation err (QPN 260092, WQE index 9a9e, vendor syndrome 6f, opcode = 5e) [0,1,6][btl_openib_compon

Re: [OMPI users] libnuma under ompi 1.3

2009-03-04 Thread Doug Reeder
Terry, Is there a libnuma.a on your system. If not the -static flag to ifort won't do any thing because there isn't a static library for it to link against. Doug Reeder On Mar 4, 2009, at 6:06 PM, Terry Frankcombe wrote: Thanks to everyone who contributed. I no longer think this is Open

Re: [OMPI users] libnuma under ompi 1.3

2009-03-04 Thread Terry Frankcombe
Thanks to everyone who contributed. I no longer think this is Open MPI's problem. This system is just stupid. Everything's 64 bit (which various probes with file confirm). There's no icc, so I can't test with that. gcc finds libnuma without -L. (Though a simple gcc -lnuma -Wl,-t reports that

Re: [OMPI users] RETRY EXCEEDED ERROR

2009-03-04 Thread Jan Lindheim
On Wed, Mar 04, 2009 at 04:34:49PM -0500, Jeff Squyres wrote: > On Mar 4, 2009, at 4:16 PM, Jan Lindheim wrote: > > >On Wed, Mar 04, 2009 at 04:02:06PM -0500, Jeff Squyres wrote: > >> This *usually* indicates a physical / layer 0 problem in your IB > >> fabric. You should do a diagnostic on your

Re: [OMPI users] RETRY EXCEEDED ERROR

2009-03-04 Thread Jeff Squyres
On Mar 4, 2009, at 4:16 PM, Jan Lindheim wrote: On Wed, Mar 04, 2009 at 04:02:06PM -0500, Jeff Squyres wrote: > This *usually* indicates a physical / layer 0 problem in your IB > fabric. You should do a diagnostic on your HCAs, cables, and switches. > > Increasing the timeout value should on

Re: [OMPI users] RETRY EXCEEDED ERROR

2009-03-04 Thread Jan Lindheim
On Wed, Mar 04, 2009 at 04:02:06PM -0500, Jeff Squyres wrote: > This *usually* indicates a physical / layer 0 problem in your IB > fabric. You should do a diagnostic on your HCAs, cables, and switches. > > Increasing the timeout value should only be necessary on very large IB > fabrics and/or

Re: [OMPI users] RETRY EXCEEDED ERROR

2009-03-04 Thread Jeff Squyres
This *usually* indicates a physical / layer 0 problem in your IB fabric. You should do a diagnostic on your HCAs, cables, and switches. Increasing the timeout value should only be necessary on very large IB fabrics and/or very congested networks. On Mar 4, 2009, at 3:28 PM, Jan Lindheim w

Re: [OMPI users] Bug reporting [was: OpenMPI 1.3]

2009-03-04 Thread Jeff Squyres
Sorry for the delay; a bunch of higher priority stuff got in the way of finishing this thread. Anyhoo... On Feb 24, 2009, at 4:24 AM, Olaf Lenz wrote: I think it would be also sufficient to place a short text and link to the Trac page, so that the developers that want to use the "Bug Tra

Re: [OMPI users] Gamess with openmpi

2009-03-04 Thread Jeff Squyres
Sorry for the delay in replying -- INBOX deluge makes me miss emails on the users list sometimes. I'm unfortunately not familiar with gamess -- have you checked with their support lists or documentation? Note that Open MPI's IB progression engine will spin hard to make progress for messag

[OMPI users] RETRY EXCEEDED ERROR

2009-03-04 Thread Jan Lindheim
I found several reports on the openmpi users mailing list from users, who need to bump up the default value for btl_openib_ib_timeout. We also have some applications on our cluster, that have problems, unless we set this value from the default 10 to 15: [24426,1],122][btl_openib_component.c:2905:

Re: [OMPI users] mpirun problem

2009-03-04 Thread Ralph Castain
I suppose one initial question is: what version of Open MPI are you running? OMPI 1.3 should not be attempting to ssh a daemon on a local job like this - OMPI 1.2 -will-, so it is important to know which one we are talking about. Just do "mpirun --version" and it should tell you. Ralph O

Re: [OMPI users] mpirun problem

2009-03-04 Thread Jeff Squyres
Sorry for the delay in replying; the usual INBOX deluge keeps me from being timely in replying to all mails... More below. On Feb 24, 2009, at 6:52 AM, Jovana Knezevic wrote: I'm new to MPI, so I'm going to explain my problem in detail I'm trying to compile a simple application using mpicc (

Re: [OMPI users] threading bug?

2009-03-04 Thread Jeff Squyres
On Feb 27, 2009, at 1:56 PM, Mahmoud Payami wrote: I am using intel lc_prof-11 (and its own mkl) and have built openmpi-1.3.1 with connfigure options: "FC=ifort F77=ifort CC=icc CXX=icpc". Then I have built my application. The linux box is 2Xamd64 quad. In the middle of running of my applic

Re: [OMPI users] openib RETRY EXCEEDED ERROR

2009-03-04 Thread Jeff Squyres
On Mar 1, 2009, at 7:24 PM, Brett Pemberton wrote: I'd appreciate some advice on if I'm using OFED correctly. I'm running OFED 1.4, however not the kernel modules, just userland. Is this a bad idea? I believe so. I'm not a kernel guy, but I've always used the userland bits matched with th

Re: [OMPI users] libnuma under ompi 1.3

2009-03-04 Thread Joshua Bernstein
Terry Frankcombe wrote: Having just downloaded and installed Open MPI 1.3 with ifort and gcc, I merrily went off to compile my application. In my final link with mpif90 I get the error: /usr/bin/ld: cannot find -lnuma Adding --showme reveals that -I/home/terry/bin/Local/include -pthread -I/

Re: [OMPI users] metahosts (like in MP-MPICH)

2009-03-04 Thread Yury Tarasievich
Jeff Squyres wrote: ... In general, you need both OMPI and your application compiled natively for each platform. One easy way to do this is to install Open MPI locally on each node in the same filesystem location (e.g., /opt/openmpi-). You also want exactly the same version of Open MPI on a

Re: [OMPI users] metahosts (like in MP-MPICH)

2009-03-04 Thread Jeff Squyres
On Mar 4, 2009, at 11:38 AM, Yury Tarasievich wrote: I'm not quite sure what an MP-MPICH meta host is. Open MPI allows you to specify multiple hosts in a hostfile and run a single MPI job across all of them, assuming they're connected by at least some common TCP network. What I need is one

Re: [OMPI users] libnuma under ompi 1.3

2009-03-04 Thread Ralph Castain
Problem is that some systems install both 32 and 64 bit support, and build OMPI both ways. So we really can't just figure it out without some help. At our location, we simply take care to specify the -L flag to point to the correct version so we avoid any confusion. On Mar 4, 2009, at 8:

Re: [OMPI users] metahosts (like in MP-MPICH)

2009-03-04 Thread Yury Tarasievich
Jeff Squyres wrote: I'm not quite sure what an MP-MPICH meta host is. Open MPI allows you to specify multiple hosts in a hostfile and run a single MPI job across all of them, assuming they're connected by at least some common TCP network. What I need is one MPI job put for distributed compu

Re: [OMPI users] Low performance of Open MPI-1.3 over Gigabit

2009-03-04 Thread Ralph H. Castain
It would also help to have some idea how you installed and ran this - e.g., did you set mpi_paffinity_alone so that the processes would bind to their processors? That could explain the cpu vs. elapsed time since it helps the processes from being swapped out as much. Ralph > Your Intel processors

Re: [OMPI users] Low performance of Open MPI-1.3 over Gigabit

2009-03-04 Thread Mattijs Janssens
Your Intel processors are I assume not the new Nehalem/I7 ones? The older quad-core ones are seriously memory bandwidth limited when running a memory intensive application. That might explain why using all 8 cores per node slows down your calculation. Why do you get such a difference between cp

Re: [OMPI users] libnuma under ompi 1.3

2009-03-04 Thread Prentice Bisbal
Jeff, See my reply to Dr. Frankcombe's original e-mail. I've experienced this same problem with the PGI compilers, so this isn't limited to just the Intel compilers. I provided a fix, but I think OpenMPI should be able to figure out and add the correct linker flags during the configuration/build s

Re: [OMPI users] libnuma under ompi 1.3

2009-03-04 Thread Prentice Bisbal
Terry Frankcombe wrote: > Having just downloaded and installed Open MPI 1.3 with ifort and gcc, I > merrily went off to compile my application. > > In my final link with mpif90 I get the error: > > /usr/bin/ld: cannot find -lnuma > > Adding --showme reveals that > > -I/home/terry/bin/Local/incl

Re: [OMPI users] Lahey 64 bit and openmpi 1.3?

2009-03-04 Thread Jeff Squyres
On Mar 2, 2009, at 10:17 AM, Tiago Silva wrote: Has anyone had success building openmpi with the 64 bit Lahey fortran compiler? I have seen a previous thread about the problems with 1.2.6 and am wondering if any progress has been made. I can build individual libraries by removing -rpath and

Re: [OMPI users] Calculation stuck in MPI

2009-03-04 Thread Jeff Squyres
No, it is not obvious, unfortunately. Can you send all the information listed here: http://www.open-mpi.org/community/help/ On Mar 3, 2009, at 5:22 AM, Ondrej Marsalek wrote: Dear everyone, I have a calculation (the CP2K program) using MPI over Infiniband and it is stuck. All processe

Re: [OMPI users] MPI-IO Inconsistency over Lustre using OMPI 1.3

2009-03-04 Thread Jeff Squyres
Unfortunately, we don't have a whole lot of insight into how the internals of the IO support work -- we mainly bundle the ROMIO package from MPICH2 into Open MPI. Our latest integration was the ROMIO from MPICH2 v1.0.7. Do you see the same behavior if you run your application under MPICH2

Re: [OMPI users] libnuma under ompi 1.3

2009-03-04 Thread Jeff Squyres
Hmm; that's odd. Is icc / icpc able to find libnuma with no -L, but ifort is unable to find it without a -L? On Mar 3, 2009, at 10:00 PM, Terry Frankcombe wrote: Having just downloaded and installed Open MPI 1.3 with ifort and gcc, I merrily went off to compile my application. In my fina

Re: [OMPI users] metahosts (like in MP-MPICH)

2009-03-04 Thread Jeff Squyres
I'm not quite sure what an MP-MPICH meta host is. Open MPI allows you to specify multiple hosts in a hostfile and run a single MPI job across all of them, assuming they're connected by at least some common TCP network. On Mar 4, 2009, at 4:42 AM, Yury Tarasievich wrote: Can't find this i

[OMPI users] metahosts (like in MP-MPICH)

2009-03-04 Thread Yury Tarasievich
Can't find this in FAQ... Can I create the metahost in OpenMPI (a la MP-MPICH), to execute the MPI application simultaneously on several physically different machines connected by TCP/IP? --

Re: [OMPI users] Low performance of Open MPI-1.3 over Gigabit

2009-03-04 Thread Sangamesh B
Hi all, Now LAM-MPI is also installed and tested the fortran application by running with LAM-MPI. But LAM-MPI is performing still worse than Open MPI No of nodes:3 cores per node:8 total core: 3*8=24 CPU TIME :1 HOURS 51 MINUTES 23.49 SECONDS ELAPSED TIME :7 HOURS 28 MINUTES