George,
Using DR was suggested to see if it could find an error. The original
problem was using OB1, and HPL gave failed residuals. The hope was
that DR would pinpoint any problems. It did not and HPL did not
progress at all (the GM counters incremented, but no tests were
completed succes
On Dec 7, 2006, at 3:14 PM, George Bosilca wrote:
On Dec 7, 2006, at 2:45 PM, Brock Palen wrote:
$ mpirun -np 4 -machinefile hosts -mca btl ^tcp -mca
btl_gm_min_rdma_size $((10*1024*1024)) ./hpcc.ompi.gm
and HPL passes. The problem seems to be in the RDMA fragmenting
code
on OSX. The bound
On Dec 7, 2006, at 2:45 PM, Brock Palen wrote:
$ mpirun -np 4 -machinefile hosts -mca btl ^tcp -mca
btl_gm_min_rdma_size $((10*1024*1024)) ./hpcc.ompi.gm
and HPL passes. The problem seems to be in the RDMA fragmenting
code
on OSX. The boundary values at the edges of the fragments are not
There were two issues here, one found the other. the OB1 works
just fine on OSX on PPC64. the DR PML does not work, there is no
output to STDOUT and the application while you can see the threads in
'top' no progress is ever made in running the application.
The original problem stems
Something is not clear for me in this discussion. Sometimes the
subject was the DR PML and sometimes the OB1 PML. In fact I'm
completely in the dark ... Which PML fails the HPCC test on MAC ?
When I look at the command line it look like it should be OB1 not DR ...
george.
On Dec 7, 2006
That is wonderful, that fixes the observed problem for running with
OB1. Has a bug for this been filed to get RDMA working on macs?
The only working MPI lib is MPICH-GM as this problem happens with
LAM-7.1.3 also.
So on track for one bug.
Would the person working on the DR PML like m
On Dec 6, 2006, at 3:09 PM, Scott Atchley wrote:
Brock and Galen,
We are willing to assist. Our best guess is that OMPI is using the
code in a way different than MPICH-GM does. One of our other
developers who is more comfortable with the GM API is looking into it.
We tried running with HPCC w
On Dec 6, 2006, at 2:29 PM, Brock Palen wrote:
I wonder if we can narrow this down a bit to perhaps a PML protocol
issue.
Start by disabling RDMA by using:
-mca btl_gm_flags 1
On the other-hand, with OB1 using btl_gm_flags 1 fixed the error
problem with OMPI! Which is a great first step.
I wonder if we can narrow this down a bit to perhaps a PML protocol
issue.
Start by disabling RDMA by using:
-mca btl_gm_flags 1
This helps some, I at least now see the start up of HPL, but i never
get a single pass, output ends at:
- Computational tests pass if scaled residuals are less
The problem is that, when running HPL, he sees failed residuals. When
running HPL under MPICH-GM, he does not.
I have tried running HPCC (HPL plus other benchmarks) using OMPI with
GM on 32-bit Xeons and 64-bit Opterons. I do not see any failed
residuals. I am trying to get access to a couple of
Is there any gotchas on using the dr pml?
also if the dr pml is finding errors, and is resending data, can i
have it tell me when that happens? Like a verbose mode?
Unfortunately you will need to update the source and recompile, try:
Updating this file: topdir/ompi/mca/pml/dr/pml_dr.h:245:
On Dec 5, 2006, at 6:15 PM, Galen M. Shipman wrote:
Brock Palen wrote:
I was asked by mirycom to run a test using the data reliability pml.
(dr) I ran it like so:
$ mpirun --mca pml dr -np 4 ./xhpl
Is this the right format for running the dr pml?
This should be fine, yes.
I can running H
Brock Palen wrote:
I was asked by mirycom to run a test using the data reliability pml.
(dr) I ran it like so:
$ mpirun --mca pml dr -np 4 ./xhpl
Is this the right format for running the dr pml?
This should be fine, yes.
I can running HPL on our test cluster to see if something is wr
I was asked by mirycom to run a test using the data reliability pml.
(dr) I ran it like so:
$ mpirun --mca pml dr -np 4 ./xhpl
Is this the right format for running the dr pml?
Also it has been running for along time, but produced no output,
The counters on the gm card are incrementing, (no
14 matches
Mail list logo