Hello Brock,
While it doesn't solve the problem, have you tried increasing the btl
timeouts like the message suggest? With 1884 cores in use perhaps there
is some over subscription in the fabric?
-Joshua Bernstein
Penguin Computing
Brock Palen wrote:
We recently installed a modest IB networ
We recently installed a modest IB network to our cluster,
When running a 1884 core IB HPL job after a run we will get an error about IB,
it does not always happen in the same place, some iterations will pass others
will fail the error is below, we are using openmpi/1.4.2 with the intel 11
compi