Re: [OMPI users] OpenMPI 1.3 Infiniband Hang

2009-08-19 Thread Allen Barnett
Hi: Setting mpi_leave_pinned to 0 allows my application to run to completion when running with openib active. I realize that it's probably not going to help my application's performance, but since "ON" is the default, I'd like to understand what's happening. There's definitely a dependence on probl

Re: [OMPI users] OpenMPI 1.3 Infiniband Hang

2009-08-13 Thread Lenny Verkhovsky
Hi, 1. The Mellanox has a newer fw for those HCAshttp:// www.mellanox.com/content/pages.php?pg=firmware_table_IH3Lx I am not sure if it will help, but newer fw usually have some bug fixes. 2. try to disable leave_pinned during the run. It's on by default in 1.3.3 Lenny. On Thu, Aug 13, 2009 at 5:1

[OMPI users] OpenMPI 1.3 Infiniband Hang

2009-08-12 Thread Allen Barnett
Hi: I recently tried to build my MPI application against OpenMPI 1.3.3. It worked fine with OMPI 1.2.9, but with OMPI 1.3.3, it hangs part way through. It does a fair amount of comm, but eventually it stops in a Send/Recv point-to-point exchange. If I turn off the openib btl, it runs to completion.

Re: [OMPI users] OpenMPI 1.3.X Configuration for OFED

2009-05-07 Thread Jeff Squyres
Sent by: To users-bounces@ Open MPI Users > open- mpi.org cc Subject 05/07/09 08:51 [OMPI users] OpenMPI

Re: [OMPI users] OpenMPI 1.3.X Configuration for OFED

2009-05-07 Thread pat . o'bryant
To users-bounces@ Open MPI Users open-mpi.org cc Subject 05/07/09 08:51 [OMPI users] OpenMPI 1.3.X

[OMPI users] OpenMPI 1.3.X Configuration for OFED

2009-05-07 Thread pat . o'bryant
I am in the process of building a production system with OpenMPI 1.3.2 with support for OFED. Is it necessary in the "configure" statement to specify "--with-openib(=DIR)" to get OFED support? I have built a test system with OpenMPI 1.3.2 and an "ompi_info" yields the output below. It appears

Re: [OMPI users] OpenMPI 1.3 and SGE 6.2u1

2009-03-19 Thread Rolf Vandevaart
un...@open-mpi.org] On Behalf Of Reuti Sent: Thursday, March 19, 2009 10:32 AM To: Open MPI Users Subject: Re: [OMPI users] OpenMPI 1.3 and SGE 6.2u1 Hi, Am 19.03.2009 um 16:07 schrieb Malone, Scott: I am having two problem with the integration of OpenMPI 1.3 and SGE 6.2u1, which we are new with

Re: [OMPI users] OpenMPI 1.3 and SGE 6.2u1

2009-03-19 Thread Ralph Castain
un...@open-mpi.org [mailto:users-bounces@open- mpi.org] On Behalf Of Reuti Sent: Thursday, March 19, 2009 10:32 AM To: Open MPI Users Subject: Re: [OMPI users] OpenMPI 1.3 and SGE 6.2u1 Hi, Am 19.03.2009 um 16:07 schrieb Malone, Scott: I am having two problem with the integration of OpenMPI 1.3 and

Re: [OMPI users] OpenMPI 1.3 and SGE 6.2u1

2009-03-19 Thread Malone, Scott
19, 2009 10:32 AM > To: Open MPI Users > Subject: Re: [OMPI users] OpenMPI 1.3 and SGE 6.2u1 > > Hi, > > Am 19.03.2009 um 16:07 schrieb Malone, Scott: > > > I am having two problem with the integration of OpenMPI 1.3 and SGE > > 6.2u1, which we are new with both.

Re: [OMPI users] OpenMPI 1.3 and SGE 6.2u1

2009-03-19 Thread Reuti
Hi, Am 19.03.2009 um 16:07 schrieb Malone, Scott: I am having two problem with the integration of OpenMPI 1.3 and SGE 6.2u1, which we are new with both. The troubles are getting jobs to suspend/resume and collect cpu time correctly. For suspend/resume I have added the following to my mp

[OMPI users] OpenMPI 1.3 and SGE 6.2u1

2009-03-19 Thread Malone, Scott
I am having two problem with the integration of OpenMPI 1.3 and SGE 6.2u1, which we are new with both. The troubles are getting jobs to suspend/resume and collect cpu time correctly. For suspend/resume I have added the following to my mpirun command: --mca orte_forward_job_control 1 --mca plm_

Re: [OMPI users] openmpi 1.3 and gridengine tight integrationproblem

2009-03-18 Thread Rene Salmon
> > At this FAQ, we show an example of a parallel environment setup. > http://www.open-mpi.org/faq/?category=running#run-n1ge-or-sge > > I am wondering if the control_slaves needs to be TRUE. > And double check the that the PE (pavtest) is on the list for the > queue > (also mentioned at the FAQ

Re: [OMPI users] openmpi 1.3 and gridengine tight integrationproblem

2009-03-18 Thread Rolf Vandevaart
On 03/18/09 09:52, Reuti wrote: Hi, Am 18.03.2009 um 14:25 schrieb Rene Salmon: Thanks for the help. I only use the machine file to run outside of SGE just to test/prove that things work outside of SGE. aha. Did you compile Open MPI 1.3 with the SGE option? When I run with in SGE here is

Re: [OMPI users] openmpi 1.3 and gridengine tight integrationproblem

2009-03-18 Thread Rene Salmon
> > aha. Did you compile Open MPI 1.3 with the SGE option? > Yes I did. hpcp7781(salmr0)142:ompi_info |grep grid MCA ras: gridengine (MCA v2.0, API v2.0, Component v1.3) > > > setenv LD_LIBRARY_PATH /bphpc7/vol0/salmr0/ompi/lib > > Maybe you have to set this LD_LIBRARY_PAT

Re: [OMPI users] openmpi 1.3 and gridengine tight integrationproblem

2009-03-18 Thread Reuti
Hi, Am 18.03.2009 um 14:25 schrieb Rene Salmon: Thanks for the help. I only use the machine file to run outside of SGE just to test/prove that things work outside of SGE. aha. Did you compile Open MPI 1.3 with the SGE option? When I run with in SGE here is what the job script looks like

Re: [OMPI users] openmpi 1.3 and gridengine tight integrationproblem

2009-03-18 Thread Rene Salmon
Hi, Thanks for the help. I only use the machine file to run outside of SGE just to test/prove that things work outside of SGE. When I run with in SGE here is what the job script looks like: hpcp7781(salmr0)128:cat simple-job.sh #!/bin/csh # #$ -S /bin/csh setenv LD_LIBRARY_PATH /bphpc7/vol0/sal

Re: [OMPI users] openmpi 1.3 and gridengine tight integration problem

2009-03-18 Thread Reuti
Hi, it shouldn't be necessary to supply a machinefile, as the one generated by SGE is taken automatically (i.e. the granted nodes are honored). You submitted the job requesting a PE? -- Reuti Am 18.03.2009 um 04:51 schrieb Salmon, Rene: Hi, I have looked through the list archives and

[OMPI users] openmpi 1.3 and gridengine tight integration problem

2009-03-17 Thread Salmon, Rene
Hi, I have looked through the list archives and google but could not find anything related to what I am seeing. I am simply trying to run the basic cpi.c code using SGE and tight integration. If run outside SGE i can run my jobs just fine: hpcp7781(salmr0)132:mpiexec -np 2 --machinefile x a.ou

Re: [OMPI users] OpenMPI 1.3:

2009-02-24 Thread Jeff Squyres
On Feb 24, 2009, at 4:43 AM, Olaf Lenz wrote: We've talked about similar errors before; I thought that the issue was caused by the Python front-end calling dlopen() to manually open the libmpi.so library. Is that the cause in your scenario? Not really. We have written a shared library _esp

Re: [OMPI users] OpenMPI 1.3:

2009-02-24 Thread Olaf Lenz
Hi! Only now I have realized that there is another mailing which seems to describe pretty much the same problem as I have (Subject: "Problems in 1.3..."). I wonder why I haven't seen it when I had searched for the bug before, and I'm sorry if I have started the subject all over again. I will

Re: [OMPI users] OpenMPI 1.3:

2009-02-24 Thread Olaf Lenz
Hi! And now for the actual mailing. Jeff Squyres wrote: We've talked about similar errors before; I thought that the issue was caused by the Python front-end calling dlopen() to manually open the libmpi.so library. Is that the cause in your scenario? Not really. We have written a shared lib

Re: [OMPI users] OpenMPI 1.3:

2009-02-23 Thread Jeff Squyres
On Feb 20, 2009, at 9:53 AM, Olaf Lenz wrote: However, I'm using OpenMPI to run a program that we currently develop (http://www.espresso-pp.de). The software uses Python as a front-end language, which loads the MPI-enabled shared library. When I start python with a script using this parallel lib

[OMPI users] openmpi 1.3: undefined symbol: mca_base_param_reg_int [was: Re: OpenMPI 1.3:]

2009-02-20 Thread Olaf Lenz
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi again! Sorry for messing up the subject. Also, I wanted to attach the output of ompi_info -all. Olaf -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.4-svn0 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iD8DBQFJnsS

[OMPI users] OpenMPI 1.3:

2009-02-20 Thread Olaf Lenz
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hello! I have compiled OpenMPI 1.3 with configure --prefix=$HOME/software The compilation works fine, and I can run normal MPI programs. However, I'm using OpenMPI to run a program that we currently develop (http://www.espresso-pp.de). The

Re: [OMPI users] openmpi-1.3: segmentation fault using Cygwin-1.5 and gcc-3.4.4

2009-02-12 Thread George Bosilca
Windows do not have natively ssh installed, so we didn't bother making sure we can start MPI applications using rsh/ssh. There is a special PLM (process starter) for Windows called process, but I didn't test it in a long time. Anyway, this only allow you to start local MPI jobs. george.

Re: [OMPI users] openmpi-1.3: segmentation fault using Cygwin-1.5 and gcc-3.4.4

2009-02-12 Thread Ralph Castain
I don't believe we support cygwin at this time...native Windows support is coming in a later release. On Feb 12, 2009, at 7:09 AM, Siegmar Gross wrote: Hi, I have installed openmpi-1.3 using Cygwin 1.5 on top of Windows XP Pro with gcc-3.4.4 with the following commands. At first I added "#d

[OMPI users] openmpi-1.3: segmentation fault using Cygwin-1.5 and gcc-4.3.2

2009-02-12 Thread Siegmar Gross
Hi, I have installed openmpi-1.3 using Cygwin 1.5 on top of Windows XP Pro with gcc-4.3.2 with the following commands. At first I added "#define NOMINMAX" before the line "#include MCA_timer_IMPLEMENTATION_HEADER" and "#undef NOMINMAX" after that line in file "ompi/tools/ompi_info/param.cc" as de

[OMPI users] openmpi-1.3: segmentation fault using Cygwin-1.5 and gcc-3.4.4

2009-02-12 Thread Siegmar Gross
Hi, I have installed openmpi-1.3 using Cygwin 1.5 on top of Windows XP Pro with gcc-3.4.4 with the following commands. At first I added "#define NOMINMAX" before the line "#include MCA_timer_IMPLEMENTATION_HEADER" and "#undef NOMINMAX" after that line in file "ompi/tools/ompi_info/param.cc" as de

Re: [OMPI users] Openmpi 1.3 problems with libtool-ltdl on CentOS 4 and 5

2009-01-31 Thread Jeff Squyres
On Jan 30, 2009, at 4:25 PM, Roy Dragseth wrote: I'm not very familiar with the workings of ltdl, I got this from one of our users. Would you suggest that if one use openmpi 1.3 and ltdl you should not explicitly link with -lltdl? At least this seems to work correctly with the example I p

Re: [OMPI users] Openmpi 1.3 problems with libtool-ltdl on CentOS 4 and 5

2009-01-30 Thread Roy Dragseth
On Friday 23 January 2009 15:31:59 Jeff Squyres wrote: > Ew. Yes, I can see this being a problem. > > I'm guessing that the real issue is that OMPI embeds the libltdl from > LT 2.2.6a inside libopen_pal (one of the internal OMPI libraries). > Waving my hands a bit, but it's not hard to imagine som

Re: [OMPI users] OpenMPI-1.3 and XGrid

2009-01-27 Thread Jeff Squyres
Thanks for reporting this Frank -- looks like we borked a symbol in the xgrid component in 1.3. It seems that the compiler doesn't complain about the missing symbol; it only shows up when you try to *run* with it. Whoops! I filed ticket https://svn.open-mpi.org/trac/ompi/ticket/1777 about

[OMPI users] OpenMPI-1.3 and XGrid

2009-01-23 Thread Frank Kahle
I'm running OpenMPI on OS X 4.11. After upgrading to OpenMPI-1.3 I get the following error when submitting a job via XGrid: dyld: lazy symbol binding failed: Symbol not found: _orte_pointer_array_add Referenced from: /usr/local/mpi/lib/openmpi/mca_plm_xgrid.so Expected in: flat namespace

Re: [OMPI users] Openmpi 1.3 problems with libtool-ltdl on CentOS 4 and 5

2009-01-23 Thread Jeff Squyres
Ew. Yes, I can see this being a problem. I'm guessing that the real issue is that OMPI embeds the libltdl from LT 2.2.6a inside libopen_pal (one of the internal OMPI libraries). Waving my hands a bit, but it's not hard to imagine some sort of clash is going on between the -lltdl you added

[OMPI users] Openmpi 1.3 problems with libtool-ltdl on CentOS 4 and 5

2009-01-23 Thread Roy Dragseth
Hi, all. I do not know if this is to be considered a real bug or not, I'm just reporting it here so people can find it if they google around for the error message this produces. There is a backtrace at the end of this mail. Problem description: Openmpi 1.3 seems to be nonfunctional when used

Re: [OMPI users] openmpi 1.3 and --wdir problem

2009-01-21 Thread Ralph Castain
This is now fixed in the trunk and will be in the 1.3.1 release. Thanks again for the heads-up! Ralph On Jan 21, 2009, at 8:45 AM, Ralph Castain wrote: You are correct - that is a bug in 1.3.0. I'm working on a fix for it now and will report back. Thanks for catching it! Ralph On Jan 21,

Re: [OMPI users] openmpi 1.3 and --wdir problem

2009-01-21 Thread Ralph Castain
You are correct - that is a bug in 1.3.0. I'm working on a fix for it now and will report back. Thanks for catching it! Ralph On Jan 21, 2009, at 3:22 AM, Geoffroy Pignot wrote: Hello I'm currently trying the new release but I cant reproduce the 1.2.8 behaviour concerning --wdir o

[OMPI users] openmpi 1.3 and --wdir problem

2009-01-21 Thread Geoffroy Pignot
Hello I'm currently trying the new release but I cant reproduce the 1.2.8 behaviour concerning --wdir option Then %% /tmp/openmpi-1.2.8/bin/mpirun -n 1 --wdir /tmp --host r003n030 pwd : --wdir /scr1 -n 1 --host r003n031 pwd /scr1 /tmp but %% /tmp/openmpi-1.3/bin/mpirun -n