[OMPI users] Checkpointing hangs with OpenMPI-1.3.1

2009-04-10 Thread neeraj
Dear All, I am trying to checkpoint a test application using openmpi-1.3.1, but fails to do so, when run multiple process on different nodes. Checkpointing runs fine, if process is running on the same node along with mpirun process. But the moment i launch MPI process from different node,

[OMPI users] 1.3.1 -rf rankfile behaviour ??

2009-04-10 Thread Geoffroy Pignot
Hi , I am currently testing the process affinity capabilities of openmpi and I would like to know if the rankfile behaviour I will describe below is normal or not ? cat hostfile.0 r011n002 slots=4 r011n003 slots=4 cat rankfile.0 rank 0=r011n002 slot=0 rank 1=r011n003 slot=1

[OMPI users] openmpi src rpm and message coalesce

2009-04-10 Thread vkm
Hi, I was trying to understand how "btl_openib_use_message_coalescing" is working. Since for a certain test scenario, IMB-EXT is working if I use "btl_openib_use_message_coalescing = 0" and not for "btl_openib_use_message_coalescing = 1" No idea, who can have BUG here either open-mpi or low-le

Re: [OMPI users] shared libraries issue compiling 1.3.1/intel 10.1.022

2009-04-10 Thread Francesco Pietra
Hi Gus: If you feel that the observations below are not relevant to openmpi, please disregard the message. You have already kindly devoted so much time to my problems. The "limits.h" issue is solved with 10.1.022 intel compilers: as I felt, the problem was with the pre-10.1.021 version of the int

Re: [OMPI users] shared libraries issue compiling 1.3.1/intel10.1.022

2009-04-10 Thread Jeff Squyres
See this FAQ entry: http://www.open-mpi.org/faq/?category=running#intel-compilers- static On Apr 10, 2009, at 12:16 PM, Francesco Pietra wrote: Hi Gus: If you feel that the observations below are not relevant to openmpi, please disregard the message. You have already kindly devoted so

[OMPI users] Fwd: shared libraries issue compiling 1.3.1/intel 10.1.022

2009-04-10 Thread Francesco Pietra
Sorry, the first line of the ouput below (copied manually) should be rad /usr/local/bin/mpirun -host deb64 -n 4 connectivity_c 2>&1 | tee connectivity.ou -- Forwarded message -- From: Francesco Pietra List-Post: users@lists.open-mpi.org Date: Fri, Apr 10, 2009 at 6:16 PM Subject

Re: [OMPI users] shared libraries issue compiling 1.3.1/intel 10.1.022

2009-04-10 Thread Mostyn Lewis
If you want to find libimf.so, which is a shared INTEL library, pass the library path with a -x on mpirun mpirun -x LD_LIBRARY_PATH DM On Fri, 10 Apr 2009, Francesco Pietra wrote: Hi Gus: If you feel that the observations below are not relevant to openmpi, please disregard the mes

Re: [OMPI users] Fwd: shared libraries issue compiling 1.3.1/intel 10.1.022

2009-04-10 Thread Gus Correa
Hi Francesco Let's concentrate on the Intel shared libraries problem for now. The FAQ Jeff sent you summarizes what I told you before. You need to setup your Intel environment (on deb64) to work with mpirun. You need to insert these commands on your .bashrc (most likely you use bash) or .cshrc

Re: [OMPI users] mpirun: symbol lookup error:/usr/local/lib/openmpi/mca_plm_lsf.so: undefined symbol: ls b_init

2009-04-10 Thread Jeff Squyres
On Apr 1, 2009, at 12:00 PM, Alessandro Surace wrote: Hi guys, I try to repost my question... I've a problem with the last stable build and the last nightly snapshot. When I run a job directly with mpirun no problem. If I try to submit it with lsf: bsub -a openmpi -m grid01 mpirun.lsf /mnt/e

Re: [OMPI users] Problems configuring OpenMPI 1.3.1 with numa, torque, and openib

2009-04-10 Thread Jeff Squyres
On Apr 9, 2009, at 6:16 PM, Gus Correa wrote: The configure scripts seem to have changed, and work different than before, particularly w.r.t. additional libraries like numa, torque, and openib. The new behavior can be a bit unexpected and puzzled me, although eventually I could build 1.3.1. Y

Re: [OMPI users] Fwd: shared libraries issue compiling 1.3.1/intel 10.1.022

2009-04-10 Thread Francesco Pietra
Hi Gus: Please see below while I go to study what Jeff suggested, On Fri, Apr 10, 2009 at 6:51 PM, Gus Correa wrote: > Hi Francesco > > Let's concentrate on the Intel shared libraries problem for now. > > The FAQ Jeff sent you summarizes what I told you before. > > You need to setup your Intel en

Re: [OMPI users] Factor of 10 loss in performance with 1.3.x

2009-04-10 Thread Jeff Squyres
On Apr 7, 2009, at 4:25 PM, Mostyn Lewis wrote: Does OpenMPI know about the number of CPUS per node for FreeBSD? This is exactly the right question: apparently it does not. Specifically, it looks like we have a bad configure test in the "posix" paffinity component which triggers it to not

Re: [OMPI users] Factor of 10 loss in performance with 1.3.x

2009-04-10 Thread Steve Kargl
On Fri, Apr 10, 2009 at 05:10:29PM -0400, Jeff Squyres wrote: > On Apr 7, 2009, at 4:25 PM, Mostyn Lewis wrote: > > >Does OpenMPI know about the number of CPUS per node for FreeBSD? > > > > This is exactly the right question: apparently it does not. > > Specifically, it looks like we have a bad

Re: [OMPI users] Factor of 10 loss in performance with 1.3.x

2009-04-10 Thread Steve Kargl
On Fri, Apr 10, 2009 at 05:10:29PM -0400, Jeff Squyres wrote: > On Apr 7, 2009, at 4:25 PM, Mostyn Lewis wrote: > > >Does OpenMPI know about the number of CPUS per node for FreeBSD? > > > > This is exactly the right question: apparently it does not. > > Specifically, it looks like we have a bad

Re: [OMPI users] Factor of 10 loss in performance with 1.3.x

2009-04-10 Thread Jeff Squyres
On Apr 10, 2009, at 6:10 PM, Steve Kargl wrote: > I'll fix. I don't know if it'll make the cut for 1.3.2 or not. I applied your patch to openmpi-1.3.2a1r20942. It built fine and the running my test indicate that it fixes the problem. Ecellent. :-) -- Jeff Squyres Cisco Systems

Re: [OMPI users] Factor of 10 loss in performance with 1.3.x

2009-04-10 Thread Jeff Squyres
On Apr 10, 2009, at 5:30 PM, Steve Kargl wrote: Thanks for looking into this issue. As a side note, FreeBSD 7.1 and higher has the cpuset_getaffinity/cpuset_setaffinity system calls. I suspect that at some point openmpi can have a opal/mca/paffinity/freebsd directory with an appropriate set of

Re: [OMPI users] Factor of 10 loss in performance with 1.3.x

2009-04-10 Thread Steve Kargl
On Fri, Apr 10, 2009 at 06:13:43PM -0400, Jeff Squyres wrote: > On Apr 10, 2009, at 5:30 PM, Steve Kargl wrote: > > >Thanks for looking into this issue. As a side note, FreeBSD 7.1 > >and higher has the cpuset_getaffinity/cpuset_setaffinity system > >calls. I suspect that at some point openmpi c

Re: [OMPI users] Problems configuring OpenMPI 1.3.1 with numa, torque, and openib

2009-04-10 Thread Gus Correa
Hi Jeff Thank you very much for the thorough explanation. The OpenMPI configure script rationale and design, as you described them, are wise and clear. They avoid tricking the user or making decisions that he/she may not want, but make the right decisions when the user defers them to OpenMPI. I