I have recently seen some OpenIB time out errors and see the
following reported:

 * btl_openib_ib_retry_count - The number of times the sender will
   attempt to retry (defaulted to 7, the maximum value).
 * btl_openib_ib_timeout - The local ACK timeout parameter (defaulted
   to 10).  The actual timeout value used is calculated as:

I'd like to confirm that, when those messages say "defaulted to",
they are telling me what's happening on the node in question and
not just what the default is.

Reason for asking is that I believe that I am setting the values of
btl_openib_ib_timeout to 20, globally, as suggested in areas of the
OpenMPI docs but those messages, if they do report what's happening,
might be telling me otherwise.

In case it is relevant, the OpenMPI in question is the bog standard
RHEL5 1.4.4.

-- 
Kevin M. Buckley                                  Room:  CO327
School of Engineering and                         Phone: +64 4 463 5971
 Computer Science
Victoria University of Wellington
New Zealand

Reply via email to