Sorry for jumping in late; the holiday and other travel prevented me
from getting to all my mail recently... :-\
Have you checked the counters on the subnet manager to see if any
other errors are occurring? It might be good to clear all the
counters, run the job, and see if the counters are increasing faster
than they should (i.e., any particular counter should advance very
very slowly -- perhaps 1 per day or so).
I'll ask around the kernel-level guys (i.e., Roland) to see what else
could cause this kind of error.
On Nov 27, 2007, at 3:35 PM, Brock Palen wrote:
Ok i will open a case with cisco,
Brock Palen
Center for Advanced Computing
bro...@umich.edu
(734)936-1985
On Nov 27, 2007, at 4:19 PM, Andrew Friedley wrote:
Brock Palen wrote:
What would be a place to look? Should this just be default then
for
OMPI? ompi_info shows the default as 10 seconds? Is that right
'seconds' ?
The other IB guys can probably answer better than I can -- I'm
not an
expert in this part of IB (or really any part I guess :). Not sure
why
a larger value isn't the default. No, its not seconds -- check the
description of the MCA parameter:
4.096 microseconds * (2^btl_openib_ib_timeout)
You sure?
ompi_info --param btl openib
MCA btl: parameter "btl_openib_ib_timeout" (current value: "10")
InfiniBand transmit timeout, in seconds
(must be >= 1)
Yeah:
MCA btl: parameter "btl_openib_ib_timeout" (current value: "10")
InfiniBand transmit timeout, plugged into formula:
4.096 microseconds * (2^btl_openib_ib_timeout)(must be
= 0 and <= 31)
Reading earlier in the thread you said OMPI v1.2.0, I got this from a
trunk checkout thats around 3 weeks old. A quick check shows this
description was changed between 1.2.0 and 1.2.1. However the use of
this parameter hasn't changed -- it's simply passed along to IB verbs
when creating a queue pair (aka a connection).
Andrew
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
--
Jeff Squyres
Cisco Systems