[OMPI users] ompi-checkpoint fails sometimes

2010-05-11 Thread ananda.mudar
Hi I am using open-mpi 1.3.4 with BLCR. Sometimes I am running into a strange problem with ompi-checkpoint command. Even though I see that all MPI processes (equal to np argument) are running, ompi-checkpoint command fails at times. I have seen this failure always when the MPI processes spawned

Re: [OMPI users] Questions about MPI_Isend

2010-05-11 Thread Gijsbert Wiesenekker
On May 11, 2010, at 9:18 , Gijsbert Wiesenekker wrote: > An OpenMPI program of mine that uses MPI_Isend and MPI_Irecv crashes after > some non-reproducible time my Fedora Linux kernel (invalid opcode), which > makes it hard to debug (there is no trace, even with the debug kernel, and if > I ru

Re: [OMPI users] PGI problems

2010-05-11 Thread Jeff Squyres
FWIW, I have PGI 10.0 (but nothing more recent) and it compiles on RHEL5.4 for me. I know there's subtle interactions between the PGI compiler suite and the back-end compiler; there could be some set of combinations thereof between 10.1 and 10.4 that got mucked up in there somewhere...? Anyhoo

Re: [OMPI users] Very poor performance with btl sm on twin nehalem servers with Mellanox Technologies MT26428 (ConnectX)

2010-05-11 Thread Oskar Enoksson
Sorry, the kernel is 2.6.32.12, not 2.6.32.2. And I forgot to mention the system is CentOS 5.4. And further ... 25MB/s is after tweaking btl_sm_num_fifos=8 and btl_sm_eager_limit=65536. Without those the rate is 9MB/s for 1MB packets and 1.5MB/s for 10kB packets :-( On 05/11/2010 08:19 PM, Oskar

[OMPI users] Very poor performance with btl sm on twin nehalem servers with Mellanox Technologies MT26428 (ConnectX)

2010-05-11 Thread Oskar Enoksson
I have a cluster with two Intel Xeon Nehalem E5520 CPU per server (quad-core, 2.27GHz). The interconnect is 4xQDR Infiniband (Mellanox ConnectX). I have compiled and installed OpenMPI 1.4.2. The kernel is 2.6.32.2 and I have compiled the kernel myself. I use gridengine 6.2u5. Openmpi was compiled

Re: [OMPI users] Questions about MPI_Isend

2010-05-11 Thread Jeff Squyres
Dick is, of course, correct. This topic has come up several times on this list: Open MPI currently does not do this kind of check. It is therefore possible for a sender to exhaust memory on a receiver if, for example, it continually sends short/eager messages that the receiver consumes off the

Re: [OMPI users] Questions about MPI_Isend

2010-05-11 Thread Jeff Squyres
On May 11, 2010, at 6:49 AM, Terry Dontje wrote: > Correct the completion of an MPI_Isend request only say's the message buffer > is no longer needed. You could use synchronous mode sends MPI_Issend which > requests will complete when the message is being processed at the destination > (that i

Re: [OMPI users] PGI problems

2010-05-11 Thread Prentice Bisbal
Dave Love wrote: > I wrote: > >> I'll see if we can get a compiler update and report back. > > Installing PGI 10.5 has fixed the configure problem for me. So, for the > archives (or FAQ?) it was a PGI bug, fixed sometime between 10.1 and > 10.4, and apparently not present in 9.0-3. It's also pr

Re: [OMPI users] PGI problems

2010-05-11 Thread Dave Love
I wrote: > I'll see if we can get a compiler update and report back. Installing PGI 10.5 has fixed the configure problem for me. So, for the archives (or FAQ?) it was a PGI bug, fixed sometime between 10.1 and 10.4, and apparently not present in 9.0-3. It's also present in 8.0-3. Thanks to Pre

Re: [OMPI users] Questions about MPI_Isend

2010-05-11 Thread Richard Treumann
The MPI standard requires that when there is a free running task posting isends to a task that is not keeping up on receives, the sending task will switch to synchronous isend BEFORE the receive side runs out of memory and fails. There should be no need for the sender to us MPI_Issend becaus

Re: [OMPI users] Questions about MPI_Isend

2010-05-11 Thread Gabriele Fatigati
Yes, it's correct, but you can use number of MPI_Test fails on MPI_ISend routines to define a limit. Remember that if you use large buffer (more than eager_limit) send many times in a few time, maybe it's better to use MPI_Send routine than MPI_ISend, to avoid too much buffer memory copy. 2010/5/

Re: [OMPI users] Questions about MPI_Isend

2010-05-11 Thread Terry Dontje
Gijsbert Wiesenekker wrote: On May 11, 2010, at 9:29 , Gabriele Fatigati wrote: Dear Gijsbert, >Ideally I would like to check how many MPI_Isend messages have not been processed yet, so that I can stop >sending messages if there are 'too many' waiting. Is there a way to do this? you can

Re: [OMPI users] mpirun -np 4 hello_world; on a eight processor shared memory machine produces wrong output

2010-05-11 Thread Pankatz, Klaus
This problem is solved now. After deinstalling all other MPI distributions on that machine openMPI 1.4.1 now works perfectly well. Thanks very very much for your advises!! Von: users-boun...@open-mpi.org [users-boun...@open-mpi.org] im Auftrag von Eugen

Re: [OMPI users] PGI problems

2010-05-11 Thread Dave Love
Prentice Bisbal writes: > Since I was successful compiled 1.4.1 with PGI 9 and 1.4.2 with PGI > 10.4, Thanks. The difference appears to be the compiler versions. > I suspect the problem is local to you. Can you go through your > environment and make sure you don't have any settings that are in

Re: [OMPI users] Questions about MPI_Isend

2010-05-11 Thread Gijsbert Wiesenekker
On May 11, 2010, at 9:29 , Gabriele Fatigati wrote: > Dear Gijsbert, > > > >Ideally I would like to check how many MPI_Isend messages have not been > >processed yet, so that I can stop >sending messages if there are 'too many' > >waiting. Is there a way to do this? > > > you can check numbe

Re: [OMPI users] Questions about MPI_Isend

2010-05-11 Thread Gabriele Fatigati
Dear Gijsbert, >Ideally I would like to check how many MPI_Isend messages have not been processed yet, so that I can stop >sending messages if there are 'too many' waiting. Is there a way to do this? you can check number of message pending simply using MPI_Test function. It return false if the

[OMPI users] Questions about MPI_Isend

2010-05-11 Thread Gijsbert Wiesenekker
An OpenMPI program of mine that uses MPI_Isend and MPI_Irecv crashes after some non-reproducible time my Fedora Linux kernel (invalid opcode), which makes it hard to debug (there is no trace, even with the debug kernel, and if I run it under valgrind it does not crash). My guess is that the kern