Hello. Sorry for the delay in confirming the minimum load that would trigger the RnR error; the holidays here were a significant interruption.
On Mon, Dec 19, 2011, at 03:30 PM, Yevgeny Kliteynik wrote: > What's the smallest number of nodes that are needed to reproduce this > problem? Does it happen with just two HCAs, one process per node? Our nodes with these HCAs are dual-socket, 4 Intel cores/socket. Working with the users, it turns out we were unable to reproduce the issue with anything less than 3 nodes and 17 processes total, with no nodes oversubscribed. So two nodes were running with 8 processes each and the third with 1 process. It could be some sort of race condition or timing issue that could theoretically be triggered for less than this, but we weren't able to provoke it. > Let's get you to the latest firmware GA of this card. Just as a reminder, I responded to the firmware part of this earlier: http://www.open-mpi.org/community/lists/users/2011/12/18014.php Thank you, V. Ram -- http://www.fastmail.fm - Access your email from home and the web