Re: [OMPI users] v1.3: mca_common_sm_mmap_init error

2009-03-19 Thread Ralph Castain
Or you can just add --leave-session-attached to your mpirun cmd line On Mar 19, 2009, at 8:10 AM, Rolf Vandevaart wrote: On 03/19/09 09:55, Dave Love wrote: Prentice Bisbal writes: I just installed OpenMPI 1.3 with tight integration for SGE. Version 1.2.8 was working just fine for several m

Re: [OMPI users] v1.3: mca_common_sm_mmap_init error

2009-03-19 Thread Rolf Vandevaart
On 03/19/09 09:55, Dave Love wrote: Prentice Bisbal writes: I just installed OpenMPI 1.3 with tight integration for SGE. Version 1.2.8 was working just fine for several months in the same arrangement. Now that I've upgraded to 1.3, I get the following errors in my standard error file: mca_co

Re: [OMPI users] v1.3: mca_common_sm_mmap_init error

2009-03-19 Thread Dave Love
Prentice Bisbal writes: > I just installed OpenMPI 1.3 with tight integration for SGE. Version > 1.2.8 was working just fine for several months in the same arrangement. > > Now that I've upgraded to 1.3, I get the following errors in my standard > error file: > > mca_common_sm_mmap_init: open /tm

Re: [OMPI users] v1.3: mca_common_sm_mmap_init error

2009-02-17 Thread Daniel De Marco
* Rolf Vandevaart [02/17/2009 11:32]: > There is a ticket for this. > > https://svn.open-mpi.org/trac/ompi/ticket/1783 > > I am working on it. I do not have a workaround. I had a fix but ran into > some issues with getting the -notify flag to work right with a > non-daemonized orted. > > Fix w

Re: [OMPI users] v1.3: mca_common_sm_mmap_init error

2009-02-17 Thread Rolf Vandevaart
On 02/17/09 11:18, Daniel De Marco wrote: Hi, * Reuti [02/02/2009 03:43]: But despite the fact that SGE's qrsh is used automatically, more severe is the fact, that on the slave nodes the orted daemons will be pushed into daemonland and no longer be bound to the sge_shepherd: 3173 1 /usr/

Re: [OMPI users] v1.3: mca_common_sm_mmap_init error

2009-02-17 Thread Daniel De Marco
Hi, * Reuti [02/02/2009 03:43]: > But despite the fact that SGE's qrsh is used automatically, more > severe is the fact, that on the slave nodes the orted daemons will be > pushed into daemonland and no longer be bound to the sge_shepherd: > > 3173 1 /usr/sge/bin/lx24-x86/sge_execd > 3431

Re: [OMPI users] v1.3: mca_common_sm_mmap_init error

2009-02-04 Thread Dave Love
Jeff Squyres writes: > Could the nodes be running out of shared memory and/or temp filesystem > space? I'm also seeing this non-reproducibly (on OpenSuSE 10.3, with Sun's Clustertools 8.1 prerelease on dual Barcelona nodes during PMB runs under SGE). I haven't had time to build the final 1.3 re

Re: [OMPI users] v1.3: mca_common_sm_mmap_init error

2009-02-03 Thread Eugene Loh
Prentice Bisbal wrote: Jeff Squyres wrote: On Feb 2, 2009, at 4:48 PM, Prentice Bisbal wrote No. I was running just a simple "Hello, world" program to test v1.3 when these errors occured. And as soon as I reverted to 1.2.8, the errors disappeared. FW

Re: [OMPI users] v1.3: mca_common_sm_mmap_init error

2009-02-03 Thread Jeff Squyres
By peers I mean the number of MPI processes on the same host. So if you run with 4 processes on a single host, OMPI sets up shared memory for those 4 processes during MPI_INIT, regardless of whether you call MPI send/receive functions or not. On Feb 3, 2009, at 1:15 PM, Prentice Bisbal wr

Re: [OMPI users] v1.3: mca_common_sm_mmap_init error

2009-02-03 Thread Prentice Bisbal
Jeff Squyres wrote: > On Feb 2, 2009, at 4:48 PM, Prentice Bisbal wrote: > >> No. I was running just a simple "Hello, world" program to test v1.3 when >> these errors occured. And as soon as I reverted to 1.2.8, the errors >> disappeared. > > FWIW, OMPI allocates shared memory based on the number

Re: [OMPI users] v1.3: mca_common_sm_mmap_init error

2009-02-03 Thread Jeff Squyres
On Feb 2, 2009, at 4:48 PM, Prentice Bisbal wrote: No. I was running just a simple "Hello, world" program to test v1.3 when these errors occured. And as soon as I reverted to 1.2.8, the errors disappeared. FWIW, OMPI allocates shared memory based on the number of peers on the host. This a

Re: [OMPI users] v1.3: mca_common_sm_mmap_init error

2009-02-02 Thread Prentice Bisbal
Is there anyone else who experienced this problem with a HEL-based distro that can upgrade to 5.3 to confirm my experience? -- Prentice Prentice Bisbal wrote: > No. I was running just a simple "Hello, world" program to test v1.3 when > these errors occured. And as soon as I reverted to 1.2.8, th

Re: [OMPI users] v1.3: mca_common_sm_mmap_init error

2009-02-02 Thread Prentice Bisbal
No. I was running just a simple "Hello, world" program to test v1.3 when these errors occured. And as soon as I reverted to 1.2.8, the errors disappeared. Interestingly enough, I just upgraded my cluster to PU_IAS 5.3, and now I can't reproduce the problem but HPL fails with a segfault, which I'll

Re: [OMPI users] v1.3: mca_common_sm_mmap_init error

2009-02-02 Thread Reuti
Am 01.02.2009 um 12:43 schrieb Jeff Squyres: Could the nodes be running out of shared memory and/or temp filesystem space? I still have this issue, and it happens only from time to time. But despite the fact that SGE's qrsh is used automatically, more severe is the fact, that on the slave

Re: [OMPI users] v1.3: mca_common_sm_mmap_init error

2009-02-01 Thread Jeff Squyres
Could the nodes be running out of shared memory and/or temp filesystem space? On Jan 29, 2009, at 3:05 PM, Rolf vandeVaart wrote: I have not seen this before. I assume that for some reason, the shared memory transport layer cannot create the file it uses for communicating within a node

Re: [OMPI users] v1.3: mca_common_sm_mmap_init error

2009-01-29 Thread Rolf vandeVaart
I have not seen this before. I assume that for some reason, the shared memory transport layer cannot create the file it uses for communicating within a node. Open MPI then selects some other transport (TCP, openib) to communicate within the node so the program runs fine. The code has not c

Re: [OMPI users] v1.3: mca_common_sm_mmap_init error

2009-01-27 Thread Mostyn Lewis
Sort of ditto but with SVN release at 20123 (and earlier): e.g. [r2250_46:30018] mca_common_sm_mmap_init: open /tmp/45139.1.all.q/openmpi-sessions-mostyn@r2250_46_0/25682/1/shared_mem_pool.r2250_46 failed with errno=2 [r2250_63:05292] mca_common_sm_mmap_init: open /tmp/45139.1.all.q/openmpi-s

[OMPI users] v1.3: mca_common_sm_mmap_init error

2009-01-27 Thread Prentice Bisbal
I just installed OpenMPI 1.3 with tight integration for SGE. Version 1.2.8 was working just fine for several months in the same arrangement. Now that I've upgraded to 1.3, I get the following errors in my standard error file: mca_common_sm_mmap_init: open /tmp/968.1.all.q/openmpi-sessions-prent i