Hi,
while running MPI cases,option btl_openib_cpc_include used to select the
connection manager.MPI provides three types connection managers 1)OOB 2)XOOB
3)RDMA_CM,but we try to use ib_cm as connection manager but failed.Is it
possible?if so,can u explain me the procedure.
Thanks & regards,
Punya
Am 08.09.2011 um 04:04 schrieb Ed Blosch:
> Typically it is something like 'qsub -W group_list=groupB
> myjob.sh'. Ultimately myjob.sh runs with gid groupB on some host in the
> cluster. When that script reaches the mpirun command, then mpirun and the
> processes started on the same host all run
Hello,
at a given point in my (Fortran90) program, I write:
===
start_time = MPI_Wtime()
call MPI_BARRIER(...)
new_time = MPI_Wtime() - start_time
write(*,*) "barrier time =",new_time
==
and then I run my code...
I expected that the values of "new_time" would ran
On Sep 8, 2011, at 3:15 AM, bhimesh akula wrote:
> while running MPI cases,option btl_openib_cpc_include used to select the
> connection manager.MPI provides three types connection managers 1)OOB 2)XOOB
> 3)RDMA_CM,but we try to use ib_cm as connection manager but failed.Is it
> possible?if so,
Order in which processes hit the barrier is only one factor in the time it
takes for that process to finish the barrier.
An easy way to think of a barrier implementation is a "fan in/fan out" model.
When each nonzero rank process calls MPI_BARRIER, it sends a message saying "I
have hit the bar
Thank you for this explanation but indeed this confirms that the LAST process
that hits the barrier should go through nearly instantaneously (except for the
broadcast time for the acknowledgment signal).
And this is not what happens in my code : EVERY process waits for a very long
time before go
The order in which you see stdout printed from mpirun is not necessarily
reflective of what order things were actually printers. Remember that the
stdout from each MPI process needs to flow through at least 3 processes and
potentially across the network before it is actually displayed on mpirun
This problem as nothing to do with stdout...
Example with 3 processes:
P0 hits barrier at t=12
P1 hits barrier at t=27
P2 hits barrier at t=41
In this situation:
P0 waits 41-12 = 29
P1 waits 41-27 = 14
P2 waits 41-41 = 00
So I should see something like (no ordering is expected):
barrier_time =
On Sep 8, 2011, at 9:17 AM, Ghislain Lartigue wrote:
> Example with 3 processes:
>
> P0 hits barrier at t=12
> P1 hits barrier at t=27
> P2 hits barrier at t=41
What is the unit of time here, and how well are these times synchronized?
> In this situation:
> P0 waits 41-12 = 29
> P1 waits 41-27
These "times" have no units, it's just an example...
Whatever units are used, at least one process should spend a very small of time
in the barrier (compared to the other processes) and this is not what I see in
my code.
The network is supposed to be excellent: my machine is #9 in the top500
su
Yes, we build OpenMPI --without-torque.
-Original Message-
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf
Of Reuti
Sent: Thursday, September 08, 2011 4:33 AM
To: Open MPI Users
Subject: EXTERNAL: Re: [OMPI users] Can you set the gid of the processes
creat
On Thu Sep 8, 2011 15:41:57, Ghislain Lartigue wrote:
> Ghislain These "times" have no units, it's just an example...
> Ghislain Whatever units are used, at least one process should spend a
very small of time in the barrier (compared to the other processes) and this is
not what I see in
I guess you forget to count the "leaving time"(fan-out). When everyone
hits the barrier, it still needs "ack" to leave. And remember in most
cases, leader process will send out "acks" in a sequence way. It's very
possible:
P0 barrier time = 29 + send/recv ack 0
P1 barrier time = 14 + send ack 0
Thanks,
I understand this but the delays that I measure are huge compared to a
classical ack procedure... (1000x more)
And this is repeatable: as far as I understand it, this shows that the network
is not involved.
Ghislain.
Le 8 sept. 2011 à 16:16, Teng Ma a écrit :
> I guess you forget to
what tick value are you using (i.e., what units are you using?)
On Thu, Sep 8, 2011 at 10:25 AM, Ghislain Lartigue <
ghislain.larti...@coria.fr> wrote:
> Thanks,
>
> I understand this but the delays that I measure are huge compared to a
> classical ack procedure... (1000x more)
> And this is repe
You'd better check process-core binding in your case. It looks to me P0
and P1 on the same node and P2 on another node, which makes ack to P0/P1
go through share memory and ack to P2 through networking.
1000x is very possible. sm latency can be about 0.03microsec. ethernet
latency is about 20-30 m
I agree sentimentally with Ghislain. The time spent in a barrier
should conceptually be some wait time, which can be very long (possibly
on the order of milliseconds or even seconds), and the time to execute
the barrier operations, which should essentially be "instantaneous" on
some time scal
I will check that, but as I said in first email, this strange behaviour happens
only in one place in my code.
I have the same time/barrier/time procedure in other places (in the same code)
and it works perfectly.
At one place I have the following output (sorted)
<00>(0) CAST GHOST DATA1 LOOP 1 b
and to fix things, the units I use are not the direct result of MPI_Wtime():
new_time = (MPI_Wtime()-start_time)*1e9/(36^3)
This means that you should multiply these times by ~20'000 to have ticks..
Le 8 sept. 2011 à 16:42, Ghislain Lartigue a écrit :
> I will check that, but as I said in first
On 9/8/2011 7:42 AM, Ghislain Lartigue wrote:
I will check that, but as I said in first email, this strange behaviour happens
only in one place in my code.
Is the strange behavior on the first time, or much later on? (You seem
to imply later on, but I thought I'd ask.)
I agree the behavior i
do
barrier/time/barrier/time
and run your code again.
Teng
> I will check that, but as I said in first email, this strange behaviour
> happens only in one place in my code.
> I have the same time/barrier/time procedure in other places (in the same
> code) and it works perfectly.
>
> At one place
This behavior happens at every call (first and following)
Here is my code (simplified):
start_time = MPI_Wtime()
call mpi_ext_barrier()
new_time = MPI_Wtime()-start_time
write(local_time,'(F9.1)') new_time*1.0e9_WP/(36.0_WP*36.0_WP
If barrier/time/barrier/time solves your problem in each measure, that
means your computation above/below your barrier is not too "synchronized".
Their overhead is diverse for each process. on 2nd/3rd/... round, the
time to enter barrier is too diverse, maybe range from [1, 1400]. This
Barrier bec
I should know OMPI better than I do, but generally, when you make an MPI
call, you could be diving into all kinds of other stuff. E.g., with
non-blocking point-to-point operations, a message might make progress
during another MPI call. E.g.,
MPI_Irecv(recv_req)
MPI_Isend(send_req)
MPI_Wait(s
I guess you're perfectly right!
I will try to test it tomorrow by putting a call system("wait(X)) befor the
barrier!
Thanks,
Ghislain.
PS:
if anyone has more information about the implementation of the MPI_IRECV()
procedure, I would be glad to learn more about it!
Le 8 sept. 2011 à 17:35, Eug
I am seeing mpi_allreduce operations freeze execution of my code on some
moderately-sized problems. The freeze does not manifest itself in every
problem. In addition, it is in a portion of the code that is repeated many
times. In the problem discussed below, the problem appears in the 60th
itera
Note also that coding the mpi_allreduce as:
call
mpi_allreduce(MPI_IN_PLACE,phim(0,1,1,1,grp),phim_size*im*jm*kmloc(coords(2)+1),mpi_real,mpi_sum,ang_com,ierr)
results in the same freezing behavior in the 60th iteration. (I don't
recall why the arrays were being passed, possibly just a mistak
27 matches
Mail list logo