On Dec 6, 2006, at 2:29 PM, Brock Palen wrote:


I wonder if we can narrow this down a bit to perhaps a PML protocol
issue.
Start by disabling RDMA by using:
-mca btl_gm_flags 1

On the other-hand,  with OB1  using btl_gm_flags 1  fixed the error
problem with OMPI!  Which is a great first step.

mpirun -np 4 --mca btl_gm_flags 1 ./xhpl

Allowed HPL to run with no errors.  I verified the performance was
better than when ran without gm

(added --mca btl ^gm )

So still a problem with DR  which i dont need but im willing to help
test it.

Scott,

Can we look into why leaving RDMA on if causing a problem?

Brock

Brock and Galen,

We are willing to assist. Our best guess is that OMPI is using the code in a way different than MPICH-GM does. One of our other developers who is more comfortable with the GM API is looking into it.

Testing with HPCC, in addition to the HPL failed residuals, I am also seeing these messages:

[3]: ERROR: from right: expected 2 and 3 as first and last byte, but got 2 and 5 instead [3]: ERROR: from right: expected 3 and 4 as first and last byte, but got 3 and 7 instead [1]: ERROR: from right: expected 4 and 5 as first and last byte, but got 4 and 3 instead [1]: ERROR: from right: expected 7 and 8 as first and last byte, but got 7 and 5 instead

which is from $HPCC/src/bench_lat_bw_1.5.2.c.

Scott

Reply via email to