[OMPI users] query regarding Open fabrics connections

2011-09-08 Thread bhimesh akula
Hi, while running MPI cases,option btl_openib_cpc_include used to select the connection manager.MPI provides three types connection managers 1)OOB 2)XOOB 3)RDMA_CM,but we try to use ib_cm as connection manager but failed.Is it possible?if so,can u explain me the procedure. Thanks & regards, Punya

Re: [OMPI users] Can you set the gid of the processes created by mpirun?

2011-09-08 Thread Reuti
Am 08.09.2011 um 04:04 schrieb Ed Blosch: > Typically it is something like 'qsub -W group_list=groupB > myjob.sh'. Ultimately myjob.sh runs with gid groupB on some host in the > cluster. When that script reaches the mpirun command, then mpirun and the > processes started on the same host all run

[OMPI users] Problem with MPI_BARRIER

2011-09-08 Thread Ghislain Lartigue
Hello, at a given point in my (Fortran90) program, I write: === start_time = MPI_Wtime() call MPI_BARRIER(...) new_time = MPI_Wtime() - start_time write(*,*) "barrier time =",new_time == and then I run my code... I expected that the values of "new_time" would ran

Re: [OMPI users] query regarding Open fabrics connections

2011-09-08 Thread Jeff Squyres
On Sep 8, 2011, at 3:15 AM, bhimesh akula wrote: > while running MPI cases,option btl_openib_cpc_include used to select the > connection manager.MPI provides three types connection managers 1)OOB 2)XOOB > 3)RDMA_CM,but we try to use ib_cm as connection manager but failed.Is it > possible?if so,

Re: [OMPI users] Problem with MPI_BARRIER

2011-09-08 Thread Jeff Squyres
Order in which processes hit the barrier is only one factor in the time it takes for that process to finish the barrier. An easy way to think of a barrier implementation is a "fan in/fan out" model. When each nonzero rank process calls MPI_BARRIER, it sends a message saying "I have hit the bar

Re: [OMPI users] Problem with MPI_BARRIER

2011-09-08 Thread Ghislain Lartigue
Thank you for this explanation but indeed this confirms that the LAST process that hits the barrier should go through nearly instantaneously (except for the broadcast time for the acknowledgment signal). And this is not what happens in my code : EVERY process waits for a very long time before go

Re: [OMPI users] Problem with MPI_BARRIER

2011-09-08 Thread Jeff Squyres
The order in which you see stdout printed from mpirun is not necessarily reflective of what order things were actually printers. Remember that the stdout from each MPI process needs to flow through at least 3 processes and potentially across the network before it is actually displayed on mpirun

Re: [OMPI users] Problem with MPI_BARRIER

2011-09-08 Thread Ghislain Lartigue
This problem as nothing to do with stdout... Example with 3 processes: P0 hits barrier at t=12 P1 hits barrier at t=27 P2 hits barrier at t=41 In this situation: P0 waits 41-12 = 29 P1 waits 41-27 = 14 P2 waits 41-41 = 00 So I should see something like (no ordering is expected): barrier_time =

Re: [OMPI users] Problem with MPI_BARRIER

2011-09-08 Thread Jeff Squyres
On Sep 8, 2011, at 9:17 AM, Ghislain Lartigue wrote: > Example with 3 processes: > > P0 hits barrier at t=12 > P1 hits barrier at t=27 > P2 hits barrier at t=41 What is the unit of time here, and how well are these times synchronized? > In this situation: > P0 waits 41-12 = 29 > P1 waits 41-27

Re: [OMPI users] Problem with MPI_BARRIER

2011-09-08 Thread Ghislain Lartigue
These "times" have no units, it's just an example... Whatever units are used, at least one process should spend a very small of time in the barrier (compared to the other processes) and this is not what I see in my code. The network is supposed to be excellent: my machine is #9 in the top500 su

Re: [OMPI users] EXTERNAL: Re: Can you set the gid of the processes created by mpirun?

2011-09-08 Thread Blosch, Edwin L
Yes, we build OpenMPI --without-torque. -Original Message- From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf Of Reuti Sent: Thursday, September 08, 2011 4:33 AM To: Open MPI Users Subject: EXTERNAL: Re: [OMPI users] Can you set the gid of the processes creat

Re: [OMPI users] Problem with MPI_BARRIER

2011-09-08 Thread Rolf Riesen
On Thu Sep 8, 2011 15:41:57, Ghislain Lartigue wrote: > Ghislain These "times" have no units, it's just an example... > Ghislain Whatever units are used, at least one process should spend a very small of time in the barrier (compared to the other processes) and this is not what I see in

Re: [OMPI users] Problem with MPI_BARRIER

2011-09-08 Thread Teng Ma
I guess you forget to count the "leaving time"(fan-out). When everyone hits the barrier, it still needs "ack" to leave. And remember in most cases, leader process will send out "acks" in a sequence way. It's very possible: P0 barrier time = 29 + send/recv ack 0 P1 barrier time = 14 + send ack 0

Re: [OMPI users] Problem with MPI_BARRIER

2011-09-08 Thread Ghislain Lartigue
Thanks, I understand this but the delays that I measure are huge compared to a classical ack procedure... (1000x more) And this is repeatable: as far as I understand it, this shows that the network is not involved. Ghislain. Le 8 sept. 2011 à 16:16, Teng Ma a écrit : > I guess you forget to

Re: [OMPI users] Problem with MPI_BARRIER

2011-09-08 Thread Jai Dayal
what tick value are you using (i.e., what units are you using?) On Thu, Sep 8, 2011 at 10:25 AM, Ghislain Lartigue < ghislain.larti...@coria.fr> wrote: > Thanks, > > I understand this but the delays that I measure are huge compared to a > classical ack procedure... (1000x more) > And this is repe

Re: [OMPI users] Problem with MPI_BARRIER

2011-09-08 Thread Teng Ma
You'd better check process-core binding in your case. It looks to me P0 and P1 on the same node and P2 on another node, which makes ack to P0/P1 go through share memory and ack to P2 through networking. 1000x is very possible. sm latency can be about 0.03microsec. ethernet latency is about 20-30 m

Re: [OMPI users] Problem with MPI_BARRIER

2011-09-08 Thread Eugene Loh
I agree sentimentally with Ghislain. The time spent in a barrier should conceptually be some wait time, which can be very long (possibly on the order of milliseconds or even seconds), and the time to execute the barrier operations, which should essentially be "instantaneous" on some time scal

Re: [OMPI users] Problem with MPI_BARRIER

2011-09-08 Thread Ghislain Lartigue
I will check that, but as I said in first email, this strange behaviour happens only in one place in my code. I have the same time/barrier/time procedure in other places (in the same code) and it works perfectly. At one place I have the following output (sorted) <00>(0) CAST GHOST DATA1 LOOP 1 b

Re: [OMPI users] Problem with MPI_BARRIER

2011-09-08 Thread Ghislain Lartigue
and to fix things, the units I use are not the direct result of MPI_Wtime(): new_time = (MPI_Wtime()-start_time)*1e9/(36^3) This means that you should multiply these times by ~20'000 to have ticks.. Le 8 sept. 2011 à 16:42, Ghislain Lartigue a écrit : > I will check that, but as I said in first

Re: [OMPI users] Problem with MPI_BARRIER

2011-09-08 Thread Eugene Loh
On 9/8/2011 7:42 AM, Ghislain Lartigue wrote: I will check that, but as I said in first email, this strange behaviour happens only in one place in my code. Is the strange behavior on the first time, or much later on? (You seem to imply later on, but I thought I'd ask.) I agree the behavior i

Re: [OMPI users] Problem with MPI_BARRIER

2011-09-08 Thread Teng Ma
do barrier/time/barrier/time and run your code again. Teng > I will check that, but as I said in first email, this strange behaviour > happens only in one place in my code. > I have the same time/barrier/time procedure in other places (in the same > code) and it works perfectly. > > At one place

Re: [OMPI users] Problem with MPI_BARRIER

2011-09-08 Thread Ghislain Lartigue
This behavior happens at every call (first and following) Here is my code (simplified): start_time = MPI_Wtime() call mpi_ext_barrier() new_time = MPI_Wtime()-start_time write(local_time,'(F9.1)') new_time*1.0e9_WP/(36.0_WP*36.0_WP

Re: [OMPI users] Problem with MPI_BARRIER

2011-09-08 Thread Teng Ma
If barrier/time/barrier/time solves your problem in each measure, that means your computation above/below your barrier is not too "synchronized". Their overhead is diverse for each process. on 2nd/3rd/... round, the time to enter barrier is too diverse, maybe range from [1, 1400]. This Barrier bec

Re: [OMPI users] Problem with MPI_BARRIER

2011-09-08 Thread Eugene Loh
I should know OMPI better than I do, but generally, when you make an MPI call, you could be diving into all kinds of other stuff. E.g., with non-blocking point-to-point operations, a message might make progress during another MPI call. E.g., MPI_Irecv(recv_req) MPI_Isend(send_req) MPI_Wait(s

Re: [OMPI users] Problem with MPI_BARRIER

2011-09-08 Thread Ghislain Lartigue
I guess you're perfectly right! I will try to test it tomorrow by putting a call system("wait(X)) befor the barrier! Thanks, Ghislain. PS: if anyone has more information about the implementation of the MPI_IRECV() procedure, I would be glad to learn more about it! Le 8 sept. 2011 à 17:35, Eug

[OMPI users] freezing in mpi_allreduce operation

2011-09-08 Thread Greg Fischer
I am seeing mpi_allreduce operations freeze execution of my code on some moderately-sized problems. The freeze does not manifest itself in every problem. In addition, it is in a portion of the code that is repeated many times. In the problem discussed below, the problem appears in the 60th itera

Re: [OMPI users] freezing in mpi_allreduce operation

2011-09-08 Thread Greg Fischer
Note also that coding the mpi_allreduce as: call mpi_allreduce(MPI_IN_PLACE,phim(0,1,1,1,grp),phim_size*im*jm*kmloc(coords(2)+1),mpi_real,mpi_sum,ang_com,ierr) results in the same freezing behavior in the 60th iteration. (I don't recall why the arrays were being passed, possibly just a mistak