from:"8mj6tc902"

Re: [OMPI users] Open MPI v1.2.5 released

2008-01-10 Thread 8mj6tc902

Hi Warner. The simplest way would certainly be to launch your job with
with the mpirun --nolocal option. If you're sure you want a
hostfile-based way to set this, simply removing the headnode from the
hostfile would also work.

-- 
--Kris

叶ってしまう夢は本当の夢と言えん。
[A dream that comes true can't really be called a dream.]

Warner Yuen  wrote
> Date: Wed, 9 Jan 2008 12:50:09 -0800
> From: Warner Yuen 
> Subject: Re: [OMPI users] Open MPI v1.2.5 released
> To: us...@open-mpi.org
> Message-ID: 
> Content-Type: text/plain; charset="us-ascii"
>
> Thanks to Brian Barrett, I was able to get through some ugly Intel
> compiler bugs during the configure script. I now have OMPI v1.2.5
> running nicely under Mac OSX v10.5 Leopard!
>
> However, I have a question about hostfiles. I would like to manually
> launch MPI jobs from my headnode, but I don't want the jobs to run on
> the head node. In LAM/MPI I could add a "hostname schedule=no" to the
> hostfile, is there an equivalent in OpenMPI? I'm sure this has come up
> before, but I couldn't find an answer in the archives.
>
> Thanks,
>
> -Warner
>
> Warner Yuen
> Scientific Computing Consultant
> Apple Computer
> email: wy...@apple.com
> Tel: 408.718.2859
> Fax: 408.715.0133

Re: [OMPI users] mixed myrinet/non-myrinet nodes

2008-01-15 Thread 8mj6tc902

We also have a mixed myrinet/ip cluster, and maybe I'm missing some
nuance of your configuration, but openmpi seems to work fine for me "as
is" with no --mca options across mixed nodes (there's a bunch of
warnings at the beginning where the non-mx nodes realize they don't have
myrinet cards and the mx nodes realize they can't talk mx to the non-mx
nodes, but everything completes fine, so I assumed OpenMPI was working
things out the transport details on it's own (and was quite pleased
about that)).

I just did a quick test to confirm that it is in fact still using mx in
that situation, and it is. I'm running OpenMPI 1.2.4 and MX 1.2.3.

It sounds to me based on those "PML add procs failed" messages that
OpenMPI is dying on start up on the non-mx nodes unless you explicitly
disable mx at runtime (perhaps because they're expecting the mx library
to be there, but it's not?)

users-request-at-open-mpi.org |openmpi-users/Allow| wrote:
> Date: Tue, 15 Jan 2008 10:25:00 -0500 (EST)
> From: M D Jones 
> Subject: Re: [OMPI users] mixed myrinet/non-myrinet nodes
> To: Open MPI Users 
> Message-ID: 
> Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
> 
> 
> Hmm, that combination seems to hang on me - but
> '--mca pml ob1 --mca btl ^mx' does indeed do the trick.
> Many thanks!
> 
> Matt
> 
> On Tue, 15 Jan 2008, George Bosilca wrote:
> 
>> This case actually works. We run into it few days ago, when we discovered 
>> that one of the compute nodes in a cluster didn't get his Myrinet card 
>> installed properly ... The performance were horrible but the application run 
>> to completion.
>>
>> You will have to use the following flags: --mca pml ob1 --mca btl mx,tcp,self
>>

-- 
--Kris

叶ってしまう夢は本当の夢と言えん。
[A dream that comes true can't really be called a dream.]

Re: [OMPI users] mixed myrinet/non-myrinet nodes

2008-01-16 Thread 8mj6tc902


> Subject: Re: [OMPI users] mixed myrinet/non-myrinet nodes
> From: M D Jones (jonesm_at_[hidden])
> Date: 2008-01-15 14:07:19
> Hmm, that is the way that I expected it to work as well -
> we see the warnings also, but closely followed by the
> errors (I've been trying both 1.2.5 and a recent 1.3
> snapshot with the same behavior). You don't have the
> mx driver loaded on the nodes that do not have a myrinet
> card, do you?

Well, the driver isn't "loaded" (ie: the kernel module isn't loaded),
but the library (libmyriexpress.so) is available. If that library isn't
available, OpenMPI will probably fail when it tries to call the mx
functions (even if only to find there's no myrinet card available).

> Our mx is a touch behind yours (1.2.3),
> but I agree that it appears to be something in the process
> startup that is at fault, so it doesn't seem likely that
> the mx version is to blame (perhaps just the fact that it
> is not installed on those nodes?).
> 
> Matt
> 
> On Wed, 16 Jan 2008, 8mj6tc902_at_[hidden] wrote:
> 
>> We also have a mixed myrinet/ip cluster, and maybe I'm missing some
>> nuance of your configuration, but openmpi seems to work fine for me "as
>> is" with no --mca options across mixed nodes (there's a bunch of
>> warnings at the beginning where the non-mx nodes realize they don't have
>> myrinet cards and the mx nodes realize they can't talk mx to the non-mx
>> nodes, but everything completes fine, so I assumed OpenMPI was working
>> things out the transport details on it's own (and was quite pleased
>> about that)).
>>
>> I just did a quick test to confirm that it is in fact still using mx in
>> that situation, and it is. I'm running OpenMPI 1.2.4 and MX 1.2.3.
>>
>> It sounds to me based on those "PML add procs failed" messages that
>> OpenMPI is dying on start up on the non-mx nodes unless you explicitly
>> disable mx at runtime (perhaps because they're expecting the mx library
>> to be there, but it's not?)
>>
>> users-request-at-open-mpi.org |openmpi-users/Allow| wrote:
>>> Date: Tue, 15 Jan 2008 10:25:00 -0500 (EST)
>>> From: M D Jones 
>>> Subject: Re: [OMPI users] mixed myrinet/non-myrinet nodes
>>> To: Open MPI Users 
>>> Message-ID: 
>>> Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
>>>
>>>
>>> Hmm, that combination seems to hang on me - but
>>> '--mca pml ob1 --mca btl ^mx' does indeed do the trick.
>>> Many thanks!
>>>
>>> Matt
>>>
>>> On Tue, 15 Jan 2008, George Bosilca wrote:
>>>
 This case actually works. We run into it few days ago, when we discovered
 that one of the compute nodes in a cluster didn't get his Myrinet card
 installed properly ... The performance were horrible but the application 
 run
 to completion.

 You will have to use the following flags: --mca pml ob1 --mca btl 
 mx,tcp,self

>>
>>
>> 


-- 
--Kris

叶ってしまう夢は本当の夢と言えん。
[A dream that comes true can't really be called a dream.]

Re: [OMPI users] openmpi credits for eager messages

2008-02-01 Thread 8mj6tc902

That would make sense. I able to break OpenMPI by having Node A wait for
messages from Node B. Node B is in fact sleeping while Node C bombards
Node A with a few thousand messages. After a while Node B wakes up and
sends Node A the message it's been waiting on, but Node A has long since
been buried and seg faults. If I decrease the number of messages C is
sending, it works properly. This was on OpenMPI 1.2.4 (using I think the
SM BTL (might have been MX or TCP, but certainly not infiniband. I could
dig up the test and try again if anyone is seriously curious).

Trying the same test on MPICH/MX went very very slow (I don't think they
have any clever buffer management) but it didn't crash.

Sacerdoti, Federico Federico.Sacerdoti-at-deshaw.com
|openmpi-users/Allow| wrote:
> Hi,
> 
> I am readying an openmpi 1.2.5 software stack for use with a
> many-thousand core cluster. I have a question about sending small
> messages that I hope can be answered on this list. 
> 
> I was under the impression that if node A wants to send a small MPI
> message to node B, it must have a credit to do so. The credit assures A
> that B has enough buffer space to accept the message. Credits are
> required by the mpi layer regardless of the BTL transport layer used.
> 
> I have been told by a Voltaire tech that this is not so, the credits are
> used by the infiniband transport layer to reliably send a message, and
> is not an openmpi feature.
> 
> Thanks,
> Federico
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
--Kris

叶ってしまう夢は本当の夢と言えん。
[A dream that comes true can't really be called a dream.]

Re: [OMPI users] openmpi credits for eager messages

2008-02-05 Thread 8mj6tc902

Wow this sparked a much more heated discussion than I was expecting. I
was just commenting that the behaviour the original author (Federico
Sacerdoti) mentioned would explain something I observed in one of my
early trials of OpenMPI. But anyway, because it seems that quite a few
people were interested, I've attached a simplified version of the test I
was describing (with all the timing checks and some of the crazier
output removed).

Now that I go back and retest this it turns out that it wasn't actually
a segfault that was killing it, but running out of memory as you and
others have predicted.

Brian W. Barrett brbarret-at-open-mpi.org |openmpi-users/Allow| wrote:
> Now that this discussion has gone way off into the MPI standard woods :).
> 
> Was your test using Open MPI 1.2.4 or 1.2.5 (the one with the segfault)? 
> There was definitely a bug in 1.2.4 that could cause exactly the behavior 
> you are describing when using the shared memory BTL, due to a silly 
> delayed initialization bug/optimization.

I'm still using Open MPI 1.2.4 and actually the SM BTL seems to be the
hardest to break (I guess I'm dodging the bullet on that delayed
initialization bug you're referring to).

> If you are using the OB1 PML (the default), you will still have the 
> possibility of running the receiver out of memory if the unexpected queue 
> grows without bounds.  I'll withold my opinion on what the standard says 
> so that we can perhaps actually help you solve your problem and stay out 
> of the weeds :).  Note however, that in general unexpected messages are a 
> bad idea and thousands of them from one peer to another should be avoided 
> at all costs -- this is just good MPI programming practice.

Actually I was expecting to break something with this test. I just
wanted to find out where it broke. Lesson learned, I wrote my more
serious programs doing exactly that (no unexpected messages). I was just
surprised that the default Open MPI settings allowed me to flood the
system so easily whereas MPICH/MX still finished not matter what I threw
at it (albeit with terrible performance (in the bad cases)).

> Now, if you are using MX, you can replicate MPICH/MX's behavior (including 
> the very slow part) by using the CM PML (--mca pml cm on the mpirun 
> command line), which will use the MX library message matching and 
> unexpected queue and therefore behave exactly like MPICH/MX.

That works exactly as you described, and it does indeed prevent memory
usage from going wild due to the unexpected messages.

Thanks for your help! (and to the others for the educational discussion!)

> 
> Brian
> 
> 
> On Sat, 2 Feb 2008, 8mj6tc...@sneakemail.com wrote:
> 
>> That would make sense. I able to break OpenMPI by having Node A wait for
>> messages from Node B. Node B is in fact sleeping while Node C bombards
>> Node A with a few thousand messages. After a while Node B wakes up and
>> sends Node A the message it's been waiting on, but Node A has long since
>> been buried and seg faults. If I decrease the number of messages C is
>> sending, it works properly. This was on OpenMPI 1.2.4 (using I think the
>> SM BTL (might have been MX or TCP, but certainly not infiniband. I could
>> dig up the test and try again if anyone is seriously curious).
>>
>> Trying the same test on MPICH/MX went very very slow (I don't think they
>> have any clever buffer management) but it didn't crash.
>>
>> Sacerdoti, Federico Federico.Sacerdoti-at-deshaw.com
>> |openmpi-users/Allow| wrote:
>>> Hi,
>>>
>>> I am readying an openmpi 1.2.5 software stack for use with a
>>> many-thousand core cluster. I have a question about sending small
>>> messages that I hope can be answered on this list.
>>>
>>> I was under the impression that if node A wants to send a small MPI
>>> message to node B, it must have a credit to do so. The credit assures A
>>> that B has enough buffer space to accept the message. Credits are
>>> required by the mpi layer regardless of the BTL transport layer used.
>>>
>>> I have been told by a Voltaire tech that this is not so, the credits are
>>> used by the infiniband transport layer to reliably send a message, and
>>> is not an openmpi feature.
>>>
>>> Thanks,
>>> Federico
>>>
>>> ___
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
--Kris

叶ってしまう夢は本当の夢と言えん。
[A dream that comes true can't really be called a dream.]
#include 
#include 
#include 
#include 

#include  //for atoi (in case someone doesn't have boost)

const int buflen=5000;

int main(int argc, char *argv[]) {
  using namespace std;
  int reps=1000;
  if(argc>1){ //optionally specify number of repeats on the command line
reps=atoi(argv[1]);
  }

  int numprocs, rank, namelen;
  char processor_name[MPI_MAX_PROCESSOR_NAME];

  M

[OMPI users] Proper way to throw an error to all nodes?

2008-06-03 Thread 8mj6tc902

So I'm working on this program which has many ways it might possibly die
at runtime, but one of them that happens frequently is the user types a
wrong (non-existant) filename on the command prompt. As it is now, the
node looking for the file notices the file doesn't exist and tries to
terminate the program. It tries to call MPI_Finalize(), but the other
nodes are all waiting for a message from the node doing the file
reading, so MPI_Finalize waits forever until the user realizes the job
isn't doing anything and terminates it manually.

So, my question is: what's the "correct" graceful way to handle
situations like this? Is there some MPI function which can basically
throw an exception to all other nodes telling them bail out now? Or is
correct behaviour just to have the node that spotted the error die
quietly and wait for the others to notice?

Thanks for any suggestions!
-- 
--Kris

叶ってしまう夢は本当の夢と言えん。
[A dream that comes true can't really be called a dream.]

[OMPI users] Problems with MPI_Issend and MX

2009-07-02 Thread 8mj6tc902

Hi. I've now spent many many hours tracking down a bug that was causing
my program to die, as though either its memory were getting corrupted or
messages were getting clobbered while going through the network, I
couldn't tell which. I really wish the checksum flag on btl_mx_flags
were working. But anyway, I think I've managed to recreate the core of
the problem in a small-ish test case which I've attached
(verifycontent.cc). This usually segfaults at MPI_Issend after sending
about 60-90 messages for me while using OpenMPI 1.3.2 with myricom's
mx-1.2.9 drivers on linux using gcc 4.3.2. Disabling the mx btl (mpirun
-mca btl ^mx) makes it work (likewise, the same for my own larger
project (Murasaki)). The MPI_Ssend using version
(verifycontent-ssend.cc) also works no problem over mx. So I suspect the
issue lies in OpenMPI 1.3.2's handling of MPI_Issend over mx, but it's
also possible I've horribly misunderstood something fundamental about
MPI and it's just my fault, so if that's the case, please let me know
(but both my this test case and Murasaki work over mpichmx, so OpenMPI
is definitely doing something different).

Here's a brief description of verifycontent.cc to make reading it easier:
* given -np=N, half the nodes will be sending, half will be receiving
some number of messages (reps)
* each message consists of buflen (5000) chars, set to some value based
on the sending node's rank and the sequence number of the message
* the receiving node starts an irecv for each sending node, tests each
request until a message arrives
* the receiver then checks the contents of the message to make sure it
matches what was supposed to be in there (this is where my real project,
Murasaki, fails actually. I can't seem to replicate that however).
* the senders meanwhile keep sending messages and dequeuing them when
their request tests as completed.

Testing out the current subversion trunk version, 1.4a1r21594, that
seems to pass my test case, but also tends to show errors like
"mca_btl_mx_init: mx_open_endpoint() failed with status 20 (Busy)" on
start up, and Murasaki still fails (messages turn into zeros about 132KB
in), so something still isn't right...

If anyone has any ideas about this test case failing, or my larger issue
of messages turning into zeros after 132KB (though sadly sometimes it
isn't at 132KB, but straight from 0KB, which is very confusing) while on
MX, I'd greatly appreciate it. Even a simple confirmation of "Yes,
MPI_Issend/Irecv with MX has issues in 1.3.2" would help my sanity.
-- 
Kris Popendorf

Keio University
http://murasaki... <- (Probably too cumbersome to expect
most people to test, but if you feel daring, try putting in some
Human/Mouse chromosomes over MX)
#include 
#include 
#include 
#include 
#include 
#include 

#include  //for atoi (in case someone doesn't have boost)

const int buflen=5*24;

int numprocs, rank, namelen;
char processor_name[MPI_MAX_PROCESSOR_NAME];

using namespace std;

class Message {
public:
  MPI_Request req;
  MPI_Status status;
  char buffer[buflen];
  int count;
  void reset(char val){
memset(buffer,val,sizeof(char)*buflen);
  }
  Message():
count(0)
  {
reset(rank);
  }
  Message(int _count) :
count(_count)
  {
reset(count+rank+1);
  }

  bool preVerify(){
char content=rank;
for(int b=0;b"<< bi << " = "<< (int)buffer[bi]<"<< bi << " = "<< (int)buffer[bi]<1){ //optionally specify number of repeats on the command line
reps=atoi(argv[1]);
  }

  MPI_Init(&argc, &argv);
  MPI_Comm_size(MPI_COMM_WORLD, &numprocs);
  MPI_Comm_rank(MPI_COMM_WORLD, &rank);
  MPI_Get_processor_name(processor_name, &namelen);

  int senders=numprocs/2;
  int receivers=numprocs-senders;

  assert(senders>0);
  assert(receivers>0);

  cout << "Process "< > sendQs(receivers);
vector counts(receivers,0);

for(int i=0;i &sendQ=sendQs[receiver];
	int target=receiver+senders;

	sendQ.push_back(Message(counts[receiver]++));
	Message &msg=sendQ.back();
	char content=msg.count+rank+1;

	//confirm that everything we're sending hasn't been corrupted
	assert(msg.buffer);
	//	cerr << rank<< ">Starting send "<:Started send "<::iterator ite=sendQ.begin();ite!=sendQ.end();){
	  MPI_Test(&ite->req,&f,&ite->status);
	  if(f){
	//	cerr << "Send "

Re: [OMPI users] Problems with MPI_Issend and MX

2009-07-03 Thread 8mj6tc902

Scott,

Thanks for your advice! Good to know about the checksum debug
functionality! Strangely enough running with either "MX_CSUM=1" or "-mca
pml cm" allows Murasaki to work normally, and makes the test case I
attached in my previous mail work. Very suspicious, but at least this
does make a functional solution (however, if I understand OpenMPI
correctly, I shouldn't be able to use the CM PML over a network where
some nodes have MX and some don't, correct?).

Scott Atchley atchley-at-myri.com |openmpi-users/Allow| wrote:
> Hi Kris,
> 
> I have not run your code yet, but I will try to this weekend.
> 
> You can have MX checksum its messages if you set MX_CSUM=1 and use the
> MX debug library (e.g. LD_LIBRARY_PATH to /opt/mx/lib/debug).
> 
> Do you have the problem if you use the MX MTL? To test it modify your
> mpirun as follows:
> 
> $ mpirun -mca pml cm ...
> 
> and do not specify any BTL info.
> 
> Scott
> 
> On Jul 2, 2009, at 6:05 PM, 8mj6tc...@sneakemail.com wrote:
> 
>> Hi. I've now spent many many hours tracking down a bug that was causing
>> my program to die, as though either its memory were getting corrupted or
>> messages were getting clobbered while going through the network, I
>> couldn't tell which. I really wish the checksum flag on btl_mx_flags
>> were working. But anyway, I think I've managed to recreate the core of
>> the problem in a small-ish test case which I've attached
>> (verifycontent.cc). This usually segfaults at MPI_Issend after sending
>> about 60-90 messages for me while using OpenMPI 1.3.2 with myricom's
>> mx-1.2.9 drivers on linux using gcc 4.3.2. Disabling the mx btl (mpirun
>> -mca btl ^mx) makes it work (likewise, the same for my own larger
>> project (Murasaki)). The MPI_Ssend using version
>> (verifycontent-ssend.cc) also works no problem over mx. So I suspect the
>> issue lies in OpenMPI 1.3.2's handling of MPI_Issend over mx, but it's
>> also possible I've horribly misunderstood something fundamental about
>> MPI and it's just my fault, so if that's the case, please let me know
>> (but both my this test case and Murasaki work over mpichmx, so OpenMPI
>> is definitely doing something different).
>>
>> Here's a brief description of verifycontent.cc to make reading it easier:
>> * given -np=N, half the nodes will be sending, half will be receiving
>> some number of messages (reps)
>> * each message consists of buflen (5000) chars, set to some value based
>> on the sending node's rank and the sequence number of the message
>> * the receiving node starts an irecv for each sending node, tests each
>> request until a message arrives
>> * the receiver then checks the contents of the message to make sure it
>> matches what was supposed to be in there (this is where my real project,
>> Murasaki, fails actually. I can't seem to replicate that however).
>> * the senders meanwhile keep sending messages and dequeuing them when
>> their request tests as completed.
>>
>> Testing out the current subversion trunk version, 1.4a1r21594, that
>> seems to pass my test case, but also tends to show errors like
>> "mca_btl_mx_init: mx_open_endpoint() failed with status 20 (Busy)" on
>> start up, and Murasaki still fails (messages turn into zeros about 132KB
>> in), so something still isn't right...
>>
>> If anyone has any ideas about this test case failing, or my larger issue
>> of messages turning into zeros after 132KB (though sadly sometimes it
>> isn't at 132KB, but straight from 0KB, which is very confusing) while on
>> MX, I'd greatly appreciate it. Even a simple confirmation of "Yes,
>> MPI_Issend/Irecv with MX has issues in 1.3.2" would help my sanity.
>> -- 
>> Kris Popendorf
>>
>> Keio University
>> http://murasaki... <- (Probably too cumbersome to expect
>> most people to test, but if you feel daring, try putting in some
>> Human/Mouse chromosomes over MX)
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
--Kris

叶ってしまう夢は本当の夢と言えん。
[A dream that comes true can't really be called a dream.]

Re: [OMPI users] Open MPI v1.2.5 released

Re: [OMPI users] mixed myrinet/non-myrinet nodes

Re: [OMPI users] mixed myrinet/non-myrinet nodes

Re: [OMPI users] openmpi credits for eager messages

Re: [OMPI users] openmpi credits for eager messages

[OMPI users] Proper way to throw an error to all nodes?

[OMPI users] Problems with MPI_Issend and MX

Re: [OMPI users] Problems with MPI_Issend and MX

8 matches

Site Navigation

Mail list logo

Footer information