I wonder if we can narrow this down a bit to perhaps a PML protocol
issue.
Start by disabling RDMA by using:
-mca btl_gm_flags 1

This helps some, I at least now see the start up of HPL, but i never get a single pass, output ends at:

- Computational tests pass if scaled residuals are less than 16.0

On the other-hand, with OB1 using btl_gm_flags 1 fixed the error problem with OMPI! Which is a great first step.

mpirun -np 4 --mca btl_gm_flags 1 ./xhpl

Allowed HPL to run with no errors. I verified the performance was better than when ran without gm

(added --mca btl ^gm )

So still a problem with DR which i dont need but im willing to help test it.

Scott,

Can we look into why leaving RDMA on if causing a problem?

Brock

Let's see if that helps things out at all.

- Galen


Scott
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



Reply via email to