Hi Kevin

Are you getting those messages from ompi_info? Or from an MPI app (and if so, 
what are you doing to get them)?


On Sep 11, 2011, at 5:25 PM, kevin.buck...@ecs.vuw.ac.nz wrote:

> I have recently seen some OpenIB time out errors and see the
> following reported:
> 
> * btl_openib_ib_retry_count - The number of times the sender will
>   attempt to retry (defaulted to 7, the maximum value).
> * btl_openib_ib_timeout - The local ACK timeout parameter (defaulted
>   to 10).  The actual timeout value used is calculated as:
> 
> I'd like to confirm that, when those messages say "defaulted to",
> they are telling me what's happening on the node in question and
> not just what the default is.
> 
> Reason for asking is that I believe that I am setting the values of
> btl_openib_ib_timeout to 20, globally, as suggested in areas of the
> OpenMPI docs but those messages, if they do report what's happening,
> might be telling me otherwise.
> 
> In case it is relevant, the OpenMPI in question is the bog standard
> RHEL5 1.4.4.
> 
> -- 
> Kevin M. Buckley                                  Room:  CO327
> School of Engineering and                         Phone: +64 4 463 5971
> Computer Science
> Victoria University of Wellington
> New Zealand
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


Reply via email to