This sounds a bit like the All_to_allv algorithm change I complained
about when 1.6.1 was released.
Original post:
http://www.open-mpi.org/community/lists/users/2012/11/20722.php
Everything waits for "rank 0" observation:
http://www.open-mpi.org/community/lists/users/2013/01/21219.php
Does switching to the older algorithm help?:
mpiexec --mca coll_tuned_use_dynamic_rules 1 --mca
coll_tuned_alltoallv_algorithm 1
Simon
On 26/04/2013 23:14, Stephan Wolf wrote:
Hi,
I have encountered really bad performance when all the nodes send data
to all the other nodes. I use Isend and Irecv with multiple
outstanding sends per node. I debugged the behavior and came to the
following conclusion: It seems that one sender locks out all other
senders for one receiver. This sender releases the receiver only when
there are no more sends posted or a node with lower rank, wants to
send to this node (deadlock prevention). As a consequence, node 0
sends all its data to all nodes, while all others are waiting, then
node 1 sends all the data, …
What is the rationale behind this behaviour and can I change it by
some MCA parameter?
Thanks
Stephan
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users