Re: [OMPI users] Anyscientific application heavily using MPI_Barrier?

2009-08-24 Thread Eugene Loh
Richard Treumann wrote: Guess I should have kept quiet a bit longer. As I recall we had already seen a counter example to Jeff's stronger statement and that motivated my narrower one. If there are no wildcard receives - every MPI_Barrier call is semantica

Re: [OMPI users] very bad parallel scaling of vasp using openmpi

2009-08-24 Thread jimkress_58
Gus, You hit the nail on the head. CPMD and VASP are both fine grained parallel Quantum Mechanics Molecular Dynamics Codes. I believe CPMD has implemented the domain decomposition methodology found in gromacs (a classical fine grained molecular dynamics code) which significantly diminishes the s

Re: [OMPI users] Anyscientific application heavily using MPI_Barrier?

2009-08-24 Thread Richard Treumann
Guess I should have kept quiet a bit longer. As I recall we had already seen a counter example to Jeff's stronger statement and that motivated my narrower one. If there are no wildcard receives - every MPI_Barrier call is semantically irrelevant. Do you have a counte

Re: [OMPI users] Anyscientific application heavily using MPI_Barrier?

2009-08-24 Thread Jeff Squyres
On Aug 24, 2009, at 4:23 PM, Eugene Loh wrote: Meanwhile, the last process, P2, is waiting on a receive before it enters the barrier. Right-o -- I missed that key point. So yes, P0's send will definitely match that first recv (before the barrier). If the barrier was not there and the P0

Re: [OMPI users] Anyscientific application heavily using MPI_Barrier?

2009-08-24 Thread Eugene Loh
Jeff Squyres wrote: On Aug 24, 2009, at 1:03 PM, Eugene Loh wrote: E.g., let's say P0 and P1 each send a message to P2, both using the same tag and communicator. Let's say P2 does two receives on that communicator and tag, using a wildcard source. So, the messages could be received in e

Re: [OMPI users] Bug? openMPI interpretation of SLURM environment variables

2009-08-24 Thread matthew . piehl
Hello again, As you requested: node64-test ~>salloc -n7 salloc: Granted job allocation 827 node64-test ~>srun hostname node64-17.... node64-17.... node64-20.... node64-18.... node64-19.... node64-18....xx

Re: [OMPI users] Anyscientific application heavily using MPI_Barrier?

2009-08-24 Thread Richard Treumann
As far as I can see, Jeff's analysis is dead on. The matching order at P2 is based on the order in which the envelopes from P0 and P1 show up at P2. The Barrier does not force an order between the communication paths P0->P2 vs. P1->P2. The MPI standard does not even say what "show up" means unle

Re: [OMPI users] Bug? openMPI interpretation of SLURM environment variables

2009-08-24 Thread Ralph Castain
Very interesting! I see the problem - we have never encountered the SLURM_TASKS_PER_NODE in that format, while the SLURM_JOB_CPUS_PER_NODE indicates that we have indeed been allocated two processors on each of the nodes! So when you just do mpirun without specifying the number of processes, we will

Re: [OMPI users] Bug? openMPI interpretation of SLURM environment variables

2009-08-24 Thread matthew . piehl
Hello, Hopefully the below information will be helpful. SLURM Version: 1.3.15 node64-test ~>salloc -n3 salloc: Granted job allocation 826 node64-test ~>srun hostname node64-24.... node64-25.... node64-24.... node64-test ~>printenv | grep SLURM SL

Re: [OMPI users] Anyscientific application heavily using MPI_Barrier?

2009-08-24 Thread Jeff Squyres
On Aug 24, 2009, at 1:03 PM, Eugene Loh wrote: E.g., let's say P0 and P1 each send a message to P2, both using the same tag and communicator. Let's say P2 does two receives on that communicator and tag, using a wildcard source. So, the messages could be received in either order. One coul

Re: [OMPI users] Bug? openMPI interpretation of SLURM environment variables

2009-08-24 Thread Ralph Castain
Haven't seen that before on any of our machines. Could you do "printenv | grep SLURM" after the salloc and send the results? What version of SLURM is this? Please run "mpirun --display-allocation hostname" and send the results. Thanks Ralph On Mon, Aug 24, 2009 at 11:30 AM, wrote: > Hello, >

Re: [OMPI users] mca_pml_ob1_send blocks

2009-08-24 Thread Shaun Jackman
I neglected to include some pertinent information: I'm using Open MPI 1.3.2. Here's a backtrace: #0 0x002a95e6890c in epoll_wait () from /lib64/tls/libc.so.6 #1 0x002a9623a39c in epoll_dispatch () from /home/sjackman/arch/xhost/lib/libopen-pal.so.0 #2 0x002a96238f10 in opal_even

[OMPI users] mca_pml_ob1_send blocks

2009-08-24 Thread Shaun Jackman
Hi, I'm seeing MPI_Send block in mca_pml_ob1_send. The packet is shorter than the eager transmit limit for shared memory (3300 bytes < 4096 bytes). I'm trying to determine if MPI_Send is blocking due to a deadlock. Will MPI_Send block even when sending a packet eagerly? Thanks, Shaun

[OMPI users] Bug? openMPI interpretation of SLURM environment variables

2009-08-24 Thread matthew . piehl
Hello, I've seem to run into an interesting problem with openMPI. After allocating 3 processors and confirming that the 3 processors are allocated. mpirun on a simple mpitest program seems to run on 4 processors. We have 2 processors per node. I can repeat this case with any odd number of nodes, o

Re: [OMPI users] Any scientific application heavily using MPI_Barrier?

2009-08-24 Thread Eugene Loh
Going back to this thread from earlier this calendar year... Ganesh wrote: Hi Dick, Jeff paraphrased an unnamed source as suggesting that: "any MPI program that relies on a barrier for correctness is an incorrect MPI application." . That is probably too strong. How about thi

Re: [OMPI users] Help: orted: command not found.

2009-08-24 Thread Yann JOBIC
Lee Amy wrote: Hi, I run some programs by using OpenMPI 1.3.3 and when I execute the command I encountered such following error messages. sh: orted: command not found -- A daemon (pid 6797) died unexpectedly with status 127

Re: [OMPI users] Help: orted: command not found.

2009-08-24 Thread Tomislav Maric
Lee Amy wrote: > Hi, > > I run some programs by using OpenMPI 1.3.3 and when I execute the > command I encountered such following error messages. > > sh: orted: command not found > -- > A daemon (pid 6797) died unexpectedly w