This sounds a bit like the All_to_allv algorithm change I complained about when 1.6.1 was released.

Original post: http://www.open-mpi.org/community/lists/users/2012/11/20722.php Everything waits for "rank 0" observation: http://www.open-mpi.org/community/lists/users/2013/01/21219.php

Does switching to the older algorithm help?:
mpiexec --mca coll_tuned_use_dynamic_rules 1 --mca coll_tuned_alltoallv_algorithm 1

Simon

On 26/04/2013 23:14, Stephan Wolf wrote:
Hi,

I have encountered really bad performance when all the nodes send data
to all the other nodes. I use Isend and Irecv with multiple
outstanding sends per node. I debugged the behavior and came to the
following conclusion: It seems that one sender locks out all other
senders for one receiver. This sender releases the receiver only when
there are no more sends posted or a node with lower rank, wants to
send to this node (deadlock prevention). As a consequence, node 0
sends all its data to all nodes, while all others are waiting, then
node 1 sends all the data, …

What is the rationale behind this behaviour and can I change it by
some MCA parameter?

Thanks

Stephan

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to