On Dec 5, 2006, at 6:15 PM, Galen M. Shipman wrote:
Brock Palen wrote:
I was asked by mirycom to run a test using the data reliability pml.
(dr) I ran it like so:
$ mpirun --mca pml dr -np 4 ./xhpl
Is this the right format for running the dr pml?
This should be fine, yes.
I can running HPL on our test cluster to see if something is wrong
with DR.
I assume you are using GM and not MX?
He is running GM.
Can you try running a simple ping-pong to make sure we have the basics
on this platform?
If you have access to them, running the intel test suite would also be
helpful in determining if/where we have an issue.
He has run IMB compiled with -DCHECK and it did not report any errors.
Is there any gotchas on using the dr pml?
also if the dr pml is finding errors, and is resending data, can i
have it tell me when that happens? Like a verbose mode?
Unfortunately you will need to update the source and recompile, try:
Updating this file: topdir/ompi/mca/pml/dr/pml_dr.h:245:#define
MCA_PML_DR_DEBUG_LEVEL -1
And change MCA_PML_DR_DEBUG_LEVEL to 0..
The problem is that, when running HPL, he sees failed residuals. When
running HPL under MPICH-GM, he does not.
I have tried running HPCC (HPL plus other benchmarks) using OMPI with
GM on 32-bit Xeons and 64-bit Opterons. I do not see any failed
residuals. I am trying to get access to a couple of OSX machines to
replicate Brock's setup.
Scott