Re: [OMPI users] 1.3.1 -rf rankfile behaviour ??

2009-07-20 Thread Ralph Castain
Looks like it got overlooked - I have requested a complete refresh of the rank_file mapper for 1.3.4. Sorry for the oversight Ralph On Jul 16, 2009, at 2:25 AM, Geoffroy Pignot wrote: Hi, I did my classic test (see below) with the 1.3.3 , and unfortunately, it doesnt works. It seems that

[OMPI users] Profiling performance by forcing transport choice.

2009-07-20 Thread Nifty Tom Mitchell
On Thu, Jun 25, 2009 at 08:37:21PM -0400, Jeff Squyres wrote: > Subject: Re: [OMPI users] 50%performance reduction due to OpenMPI v > 1.3.2forcing > allMPI traffic over Ethernet instead of using Infiniband While the previous thread on "performance reduction" went left, right, forward and

Re: [OMPI users] ifort and gfortran module

2009-07-20 Thread Martin Siegert
Hi, I want to avoid separate MPI distributions since we compile many MPI software packages. Having more than one MPI distribution (at least) doubles the amount of work. For now I came up with the following solution: 1. compile openmpi using gfortran as the Fortran compiler and install it in /

[OMPI users] MPI_Barrier called late within ompi_mpi_finalize when MPIIO fd not closed

2009-07-20 Thread Jed Brown
This helped me track down a leaked file descriptor, but I think the order of events is not desirable. If an MPIIO file descriptor is not closed before MPI_Finalize, I get the following. *** An error occurred in MPI_Barrier *** after MPI was finalized *** MPI_ERRORS_ARE_FATAL (your MPI job will n

Re: [OMPI users] Possible openmpi bug?

2009-07-20 Thread Steven Dale
Tried adjusting oob_tcp_peer_retries. Same result. I don't think it to be a memory limitation...I've got 64GB per box and this is only taking about 5GB. I've got no limitations set on a per-job or per-process basis. Steve Dale Senior Platform Analyst Health Canada Phone: (

Re: [OMPI users] Possible openmpi bug?

2009-07-20 Thread Ralph Castain
Try adjusting this: oob_tcp_peer_retries = 10 to be oob_tcp_peer_retries = 1000 It should have given you an error if this failed, but let's give it a try anyway. You might also check to see if you are hitting memory limitations. If so, or if you just want to try anyway, try reducing the value o

Re: [OMPI users] Possible openmpi bug?

2009-07-20 Thread Steven Dale
Okay, now the plot is just getting weirder. I implemented most of the changes you recommend below. We are not running panasas, and our network is GB ethernet only, so I left the openib parameters out as well. I also recompiled with the switches suggested in the tlcc directory for the non-panasa

Re: [OMPI users] ifort and gfortran module

2009-07-20 Thread Dave Love
rahmani writes: > Hi, > you should compile openmpi with each pf intel and gfortran seperatly > and install each of them in a separate location, and use mpi-selector > to select one. What, precisely, requires that, at least if you can recompile the MPI program with appropriate options? (Presumab

Re: [OMPI users] [Open MPI Announce] Open MPI v1.3.3 released

2009-07-20 Thread Dave Love
Ralph Castain writes: > Hmmm...there should be messages on both the user and devel lists > regarding binary compatibility at the MPI level being promised for > 1.3.2 and beyond. This is confusing. As I read the quotes below, recompilation is necessary, and the announcement has items which sugge

[OMPI users] OpenMPI.1.3.2 : PML add procs failed error while running with -mca btl openib, self, sm

2009-07-20 Thread Hardik Patel
Hi All, We are running open MPI 1.3.2 with OFED1.5. we have 8 node cluster with 10Gb Iwarp ethernet card. Node name are as below n130,n131,n132,n133,n134,n135,n136,n137. Respective 10GB hostname are n130x,n131x. n137x. we have /root/mpd.hosts entry like as below: n130x n131x n134x n135

[OMPI users] OpenMPI.1.3.2 : PML add procs failed error while running with -mca btl openib, self, sm

2009-07-20 Thread Kartik
Hi, We are running open MPI 1.3.2 with OFED1.5. we have 8 node cluster with 10Gb Iwarp ethernet card. Node name are as below n130,n131,n132,n133,n134,n135,n136,n137. Respective 10GB hostname are n130x,n131x. n137x. we have /root/mpd.hosts entry like as below: n130x n131x n134x n135x n1