Or you can just add --leave-session-attached to your mpirun cmd line
On Mar 19, 2009, at 8:10 AM, Rolf Vandevaart wrote:
On 03/19/09 09:55, Dave Love wrote:
Prentice Bisbal writes:
I just installed OpenMPI 1.3 with tight integration for SGE. Version
1.2.8 was working just fine for several m
On 03/19/09 09:55, Dave Love wrote:
Prentice Bisbal writes:
I just installed OpenMPI 1.3 with tight integration for SGE. Version
1.2.8 was working just fine for several months in the same arrangement.
Now that I've upgraded to 1.3, I get the following errors in my standard
error file:
mca_co
Prentice Bisbal writes:
> I just installed OpenMPI 1.3 with tight integration for SGE. Version
> 1.2.8 was working just fine for several months in the same arrangement.
>
> Now that I've upgraded to 1.3, I get the following errors in my standard
> error file:
>
> mca_common_sm_mmap_init: open /tm
* Rolf Vandevaart [02/17/2009 11:32]:
> There is a ticket for this.
>
> https://svn.open-mpi.org/trac/ompi/ticket/1783
>
> I am working on it. I do not have a workaround. I had a fix but ran into
> some issues with getting the -notify flag to work right with a
> non-daemonized orted.
>
> Fix w
On 02/17/09 11:18, Daniel De Marco wrote:
Hi,
* Reuti [02/02/2009 03:43]:
But despite the fact that SGE's qrsh is used automatically, more
severe is the fact, that on the slave nodes the orted daemons will be
pushed into daemonland and no longer be bound to the sge_shepherd:
3173 1 /usr/
Hi,
* Reuti [02/02/2009 03:43]:
> But despite the fact that SGE's qrsh is used automatically, more
> severe is the fact, that on the slave nodes the orted daemons will be
> pushed into daemonland and no longer be bound to the sge_shepherd:
>
> 3173 1 /usr/sge/bin/lx24-x86/sge_execd
> 3431
Jeff Squyres writes:
> Could the nodes be running out of shared memory and/or temp filesystem
> space?
I'm also seeing this non-reproducibly (on OpenSuSE 10.3, with Sun's
Clustertools 8.1 prerelease on dual Barcelona nodes during PMB runs
under SGE). I haven't had time to build the final 1.3 re
Prentice Bisbal wrote:
Jeff Squyres wrote:
On Feb 2, 2009, at 4:48 PM, Prentice Bisbal wrote
No. I was running just a simple "Hello, world" program to test v1.3 when
these errors occured. And as soon as I reverted to 1.2.8, the errors
disappeared.
FW
By peers I mean the number of MPI processes on the same host. So if
you run with 4 processes on a single host, OMPI sets up shared memory
for those 4 processes during MPI_INIT, regardless of whether you call
MPI send/receive functions or not.
On Feb 3, 2009, at 1:15 PM, Prentice Bisbal wr
Jeff Squyres wrote:
> On Feb 2, 2009, at 4:48 PM, Prentice Bisbal wrote:
>
>> No. I was running just a simple "Hello, world" program to test v1.3 when
>> these errors occured. And as soon as I reverted to 1.2.8, the errors
>> disappeared.
>
> FWIW, OMPI allocates shared memory based on the number
On Feb 2, 2009, at 4:48 PM, Prentice Bisbal wrote:
No. I was running just a simple "Hello, world" program to test v1.3
when
these errors occured. And as soon as I reverted to 1.2.8, the errors
disappeared.
FWIW, OMPI allocates shared memory based on the number of peers on the
host. This a
Is there anyone else who experienced this problem with a HEL-based
distro that can upgrade to 5.3 to confirm my experience?
--
Prentice
Prentice Bisbal wrote:
> No. I was running just a simple "Hello, world" program to test v1.3 when
> these errors occured. And as soon as I reverted to 1.2.8, th
No. I was running just a simple "Hello, world" program to test v1.3 when
these errors occured. And as soon as I reverted to 1.2.8, the errors
disappeared.
Interestingly enough, I just upgraded my cluster to PU_IAS 5.3, and now
I can't reproduce the problem but HPL fails with a segfault, which I'll
Am 01.02.2009 um 12:43 schrieb Jeff Squyres:
Could the nodes be running out of shared memory and/or temp
filesystem space?
I still have this issue, and it happens only from time to time. But
despite the fact that SGE's qrsh is used automatically, more severe
is the fact, that on the slave
Could the nodes be running out of shared memory and/or temp filesystem
space?
On Jan 29, 2009, at 3:05 PM, Rolf vandeVaart wrote:
I have not seen this before. I assume that for some reason, the
shared memory transport layer cannot create the file it uses for
communicating within a node
I have not seen this before. I assume that for some reason, the shared
memory transport layer cannot create the file it uses for communicating
within a node. Open MPI then selects some other transport (TCP, openib)
to communicate within the node so the program runs fine.
The code has not c
Sort of ditto but with SVN release at 20123 (and earlier):
e.g.
[r2250_46:30018] mca_common_sm_mmap_init: open
/tmp/45139.1.all.q/openmpi-sessions-mostyn@r2250_46_0/25682/1/shared_mem_pool.r2250_46
failed with errno=2
[r2250_63:05292] mca_common_sm_mmap_init: open
/tmp/45139.1.all.q/openmpi-s
I just installed OpenMPI 1.3 with tight integration for SGE. Version
1.2.8 was working just fine for several months in the same arrangement.
Now that I've upgraded to 1.3, I get the following errors in my standard
error file:
mca_common_sm_mmap_init: open /tmp/968.1.all.q/openmpi-sessions-prent
i
18 matches
Mail list logo