Edgar,
I forgot to answer your previous question. I used MPI 1.5.4 and the C++ API.
Thatyene Ramos
On Mon, Apr 9, 2012 at 8:00 PM, Thatyene Louise Alves de Souza Ramos <
thaty...@gmail.com> wrote:
> Hi Edgar, sorry about the late response. I've been travelling without
> Internet access.
>
> Wel
Hi Edgar, sorry about the late response. I've been travelling without
Internet access.
Well, I took the code Rodrigo provided and modified the client to make the
dup after the creation of the new inter communicator, without 1 process.
That is, I just replaced the lines 54-55 in the *removeRank* me
so just to confirm, I ran our test suite for inter-communicator
collective operations and communicator duplication, and everything still
works. Specifically comm_dup on an intercommunicator is not
fundamentally broken, but worked for my tests.
Having your code to see what your code precisely does
can you please send me your testcode?
Thanks
Edgar
On 4/4/2012 3:09 PM, Thatyene Louise Alves de Souza Ramos wrote:
> Hi Edgar, thank you for the response.
>
> Unfortunately, I've tried with and without this option. In both the
> result was the same... =(
>
> On Wed, Apr 4, 2012 at 5:04 PM, Edga
Hi Edgar, thank you for the response.
Unfortunately, I've tried with and without this option. In both the result
was the same... =(
On Wed, Apr 4, 2012 at 5:04 PM, Edgar Gabriel wrote:
> did you try to start the program with the --mca coll ^inter switch that
> I mentioned? Collective dup for in
did you try to start the program with the --mca coll ^inter switch that
I mentioned? Collective dup for intercommunicators should work, its
probably again the bcast over a communicator of size 1 that is causing
the hang, and you could avoid it with the flag that I mentioned above.
Also, if you cou
Hi there.
I've made some tests related to the problem reported by Rodrigo. And I
think, I'd rather be wrong, that *collective calls like Create and Dup do
not work with Inter communicators. I've try this in the client group:*
*MPI::Intercomm tmp_inter_comm;*
*
*
*tmp_inter_comm = server_comm.Crea
it just uses a different algorithm which avoids the bcast on a
communicator of 1 (which is causing the problem here).
Thanks
Edgar
On 3/28/2012 12:08 PM, Rodrigo Oliveira wrote:
> Hi Edgar,
>
> I tested the execution of my code using the option -mca coll ^inter as
> you suggested and the program
Hi Edgar,
I tested the execution of my code using the option -mca coll ^inter as you
suggested and the program worked fine, even when I use 1 server instance.
What is the modification caused by this parameter? I did not find an
explanation about the utilization of the module coll inter.
Thanks a
Hi Edgar.
Thanks for the response. I just did not understand why the Barrier works
before I remove one of the client processes.
I tryed it with 1 server and 3 clients and it worked properly. After I
removed 1 of the clients, it stops working. So, the removal is affecting
the functionality of Barr
yes and no,. So first, here is a quick fix for you: if you start the
server using
mpirun -np 2 -mca coll ^inter ./server
your test code finishes (with one minor modification to your code,
namely the process that is being excluded on the client side does need a
condition to leave the while loop as
Hi Edgar,
Did you take a look at my code? Any idea about what is happening? I did a
lot of tests and it does not work.
Thanks
On Tue, Mar 20, 2012 at 3:43 PM, Rodrigo Oliveira wrote:
> The command I use to compile and run is:
>
> mpic++ server.cc -o server && mpic++ client.cc -o client && mpir
The command I use to compile and run is:
mpic++ server.cc -o server && mpic++ client.cc -o client && mpirun -np 1
./server
Rodrigo
On Tue, Mar 20, 2012 at 3:40 PM, Rodrigo Oliveira wrote:
> Hi Edgar.
>
> Thanks for the response. The simplified code is attached: server, client
> and a .h contai
Hi Edgar.
Thanks for the response. The simplified code is attached: server, client
and a .h containing some constants. I put some "prints" to show the
behavior.
Regards
Rodrigo
On Tue, Mar 20, 2012 at 11:47 AM, Edgar Gabriel wrote:
> do you have by any chance the actual or a small reproducer
do you have by any chance the actual or a small reproducer? It might be
much easier to hunt the problem down...
Thanks
Edgar
On 3/19/2012 8:12 PM, Rodrigo Oliveira wrote:
> Hi there.
>
> I am facing a very strange problem when using MPI_Barrier over an
> inter-communicator after some operations
Thank you: this is very enlightening.
I will try this and let you know...
Ghislain.
Le 9 sept. 2011 à 18:00, Eugene Loh a écrit :
>
>
> On 9/8/2011 11:47 AM, Ghislain Lartigue wrote:
>> I guess you're perfectly right!
>> I will try to test it tomorrow by putting a call system("wait(X)) befor t
On 9/8/2011 11:47 AM, Ghislain Lartigue wrote:
I guess you're perfectly right!
I will try to test it tomorrow by putting a call system("wait(X)) befor the
barrier!
What does "wait(X)" mean?
Anyhow, here is how I see your computation:
A) The first barrier simply synchronizes the processes.
I guess you're perfectly right!
I will try to test it tomorrow by putting a call system("wait(X)) befor the
barrier!
Thanks,
Ghislain.
PS:
if anyone has more information about the implementation of the MPI_IRECV()
procedure, I would be glad to learn more about it!
Le 8 sept. 2011 à 17:35, Eug
I should know OMPI better than I do, but generally, when you make an MPI
call, you could be diving into all kinds of other stuff. E.g., with
non-blocking point-to-point operations, a message might make progress
during another MPI call. E.g.,
MPI_Irecv(recv_req)
MPI_Isend(send_req)
MPI_Wait(s
If barrier/time/barrier/time solves your problem in each measure, that
means your computation above/below your barrier is not too "synchronized".
Their overhead is diverse for each process. on 2nd/3rd/... round, the
time to enter barrier is too diverse, maybe range from [1, 1400]. This
Barrier bec
This behavior happens at every call (first and following)
Here is my code (simplified):
start_time = MPI_Wtime()
call mpi_ext_barrier()
new_time = MPI_Wtime()-start_time
write(local_time,'(F9.1)') new_time*1.0e9_WP/(36.0_WP*36.0_WP
do
barrier/time/barrier/time
and run your code again.
Teng
> I will check that, but as I said in first email, this strange behaviour
> happens only in one place in my code.
> I have the same time/barrier/time procedure in other places (in the same
> code) and it works perfectly.
>
> At one place
On 9/8/2011 7:42 AM, Ghislain Lartigue wrote:
I will check that, but as I said in first email, this strange behaviour happens
only in one place in my code.
Is the strange behavior on the first time, or much later on? (You seem
to imply later on, but I thought I'd ask.)
I agree the behavior i
and to fix things, the units I use are not the direct result of MPI_Wtime():
new_time = (MPI_Wtime()-start_time)*1e9/(36^3)
This means that you should multiply these times by ~20'000 to have ticks..
Le 8 sept. 2011 à 16:42, Ghislain Lartigue a écrit :
> I will check that, but as I said in first
I will check that, but as I said in first email, this strange behaviour happens
only in one place in my code.
I have the same time/barrier/time procedure in other places (in the same code)
and it works perfectly.
At one place I have the following output (sorted)
<00>(0) CAST GHOST DATA1 LOOP 1 b
I agree sentimentally with Ghislain. The time spent in a barrier
should conceptually be some wait time, which can be very long (possibly
on the order of milliseconds or even seconds), and the time to execute
the barrier operations, which should essentially be "instantaneous" on
some time scal
You'd better check process-core binding in your case. It looks to me P0
and P1 on the same node and P2 on another node, which makes ack to P0/P1
go through share memory and ack to P2 through networking.
1000x is very possible. sm latency can be about 0.03microsec. ethernet
latency is about 20-30 m
what tick value are you using (i.e., what units are you using?)
On Thu, Sep 8, 2011 at 10:25 AM, Ghislain Lartigue <
ghislain.larti...@coria.fr> wrote:
> Thanks,
>
> I understand this but the delays that I measure are huge compared to a
> classical ack procedure... (1000x more)
> And this is repe
Thanks,
I understand this but the delays that I measure are huge compared to a
classical ack procedure... (1000x more)
And this is repeatable: as far as I understand it, this shows that the network
is not involved.
Ghislain.
Le 8 sept. 2011 à 16:16, Teng Ma a écrit :
> I guess you forget to
I guess you forget to count the "leaving time"(fan-out). When everyone
hits the barrier, it still needs "ack" to leave. And remember in most
cases, leader process will send out "acks" in a sequence way. It's very
possible:
P0 barrier time = 29 + send/recv ack 0
P1 barrier time = 14 + send ack 0
On Thu Sep 8, 2011 15:41:57, Ghislain Lartigue wrote:
> Ghislain These "times" have no units, it's just an example...
> Ghislain Whatever units are used, at least one process should spend a
very small of time in the barrier (compared to the other processes) and this is
not what I see in
These "times" have no units, it's just an example...
Whatever units are used, at least one process should spend a very small of time
in the barrier (compared to the other processes) and this is not what I see in
my code.
The network is supposed to be excellent: my machine is #9 in the top500
su
On Sep 8, 2011, at 9:17 AM, Ghislain Lartigue wrote:
> Example with 3 processes:
>
> P0 hits barrier at t=12
> P1 hits barrier at t=27
> P2 hits barrier at t=41
What is the unit of time here, and how well are these times synchronized?
> In this situation:
> P0 waits 41-12 = 29
> P1 waits 41-27
This problem as nothing to do with stdout...
Example with 3 processes:
P0 hits barrier at t=12
P1 hits barrier at t=27
P2 hits barrier at t=41
In this situation:
P0 waits 41-12 = 29
P1 waits 41-27 = 14
P2 waits 41-41 = 00
So I should see something like (no ordering is expected):
barrier_time =
The order in which you see stdout printed from mpirun is not necessarily
reflective of what order things were actually printers. Remember that the
stdout from each MPI process needs to flow through at least 3 processes and
potentially across the network before it is actually displayed on mpirun
Thank you for this explanation but indeed this confirms that the LAST process
that hits the barrier should go through nearly instantaneously (except for the
broadcast time for the acknowledgment signal).
And this is not what happens in my code : EVERY process waits for a very long
time before go
Order in which processes hit the barrier is only one factor in the time it
takes for that process to finish the barrier.
An easy way to think of a barrier implementation is a "fan in/fan out" model.
When each nonzero rank process calls MPI_BARRIER, it sends a message saying "I
have hit the bar
37 matches
Mail list logo