Scott,
Thanks for your advice! Good to know about the checksum debug
functionality! Strangely enough running with either "MX_CSUM=1" or "-mca
pml cm" allows Murasaki to work normally, and makes the test case I
attached in my previous mail work. Very suspicious, but at least this
does make a functi
Hi. I've now spent many many hours tracking down a bug that was causing
my program to die, as though either its memory were getting corrupted or
messages were getting clobbered while going through the network, I
couldn't tell which. I really wish the checksum flag on btl_mx_flags
were working. But
So I'm working on this program which has many ways it might possibly die
at runtime, but one of them that happens frequently is the user types a
wrong (non-existant) filename on the command prompt. As it is now, the
node looking for the file notices the file doesn't exist and tries to
terminate the
Wow this sparked a much more heated discussion than I was expecting. I
was just commenting that the behaviour the original author (Federico
Sacerdoti) mentioned would explain something I observed in one of my
early trials of OpenMPI. But anyway, because it seems that quite a few
people were interes
That would make sense. I able to break OpenMPI by having Node A wait for
messages from Node B. Node B is in fact sleeping while Node C bombards
Node A with a few thousand messages. After a while Node B wakes up and
sends Node A the message it's been waiting on, but Node A has long since
been buried
> Subject: Re: [OMPI users] mixed myrinet/non-myrinet nodes
> From: M D Jones (jonesm_at_[hidden])
> Date: 2008-01-15 14:07:19
> Hmm, that is the way that I expected it to work as well -
> we see the warnings also, but closely followed by the
> errors (I've been trying both 1.2.5 and a recent 1.3
We also have a mixed myrinet/ip cluster, and maybe I'm missing some
nuance of your configuration, but openmpi seems to work fine for me "as
is" with no --mca options across mixed nodes (there's a bunch of
warnings at the beginning where the non-mx nodes realize they don't have
myrinet cards and the
Hi Warner. The simplest way would certainly be to launch your job with
with the mpirun --nolocal option. If you're sure you want a
hostfile-based way to set this, simply removing the headnode from the
hostfile would also work.
--
--Kris
叶ってしまう夢は本当の夢と言えん。
[A dream that comes true can't really be