On Dec 5, 2006, at 6:15 PM, Galen M. Shipman wrote:

Brock Palen wrote:

I was asked by mirycom to run a test using the data reliability pml.
(dr)  I ran it like so:

$ mpirun  --mca pml dr -np 4 ./xhpl

Is this the right format for running the dr pml?

This should be fine, yes.
I can running HPL on our test cluster to see if something is wrong with DR.
I assume you are using GM and not MX?

He is running GM.

Can you try running a simple ping-pong to make sure we have the basics
on this platform?
If you have access to them, running the intel test suite would also be
helpful in determining if/where we have an issue.

He has run IMB compiled with -DCHECK and it did not report any errors.

Is there any gotchas on using the dr pml?
also if the dr pml is finding errors, and is resending data, can i
have it tell me when that happens?  Like a verbose mode?

Unfortunately you will need to update the source and recompile, try:

Updating this file: topdir/ompi/mca/pml/dr/pml_dr.h:245:#define
MCA_PML_DR_DEBUG_LEVEL -1
And change MCA_PML_DR_DEBUG_LEVEL to 0..

The problem is that, when running HPL, he sees failed residuals. When running HPL under MPICH-GM, he does not.

I have tried running HPCC (HPL plus other benchmarks) using OMPI with GM on 32-bit Xeons and 64-bit Opterons. I do not see any failed residuals. I am trying to get access to a couple of OSX machines to replicate Brock's setup.

Scott

Reply via email to