Re: [OMPI users] simplest way to check message queues

2010-09-02 Thread Ashley Pittman
On 1 Sep 2010, at 23:32, Jaison Mulerikkal wrote: > Hi, > > I am getting interested in this thread. > > I'm looking for some solutions, where I can redirect a task/message > (MPI_send) to a particular process (say rank 1), which is in a queue (at rank > 1) to another process (say rank 2), if t

Re: [OMPI users] simplest way to check message queues

2010-09-02 Thread Ashley Pittman
On 2 Sep 2010, at 15:56, Brock Palen wrote: > Ashly still having trouble using padb with openmpi/1.4.2 > > [dianawon@nyx0862 ~]$ /home/software/rhel5/padb/3.0/padb -a -Q > [nyx0862.engin.umich.edu:30717] [[16608,0],0]-[[25542,0],0] oob-tcp: > Communication retries exceeded. Can not communicate

Re: [OMPI users] simplest way to check message queues

2010-09-02 Thread Brock Palen
Ashly still having trouble using padb with openmpi/1.4.2 [dianawon@nyx0862 ~]$ /home/software/rhel5/padb/3.0/padb -a -Q [nyx0862.engin.umich.edu:30717] [[16608,0],0]-[[25542,0],0] oob-tcp: Communication retries exceeded. Can not communicate with peer [nyx0862.engin.umich.edu:30717] [[16608,0],0]

Re: [OMPI users] simplest way to check message queues

2010-09-02 Thread Brock Palen
Ah ok, I put it there just because the user couldn't read that from my home space, and never even thought of that. gahhh. Thanks, BTW I tried joining the padb mailing list. Brock Palen www.umich.edu/~brockp Center for Advanced Computing bro...@umich.edu (734)936-1985 On Sep 1, 2010, at 6:11

Re: [OMPI users] simplest way to check message queues

2010-09-01 Thread Jaison Mulerikkal
Hi, I am getting interested in this thread. I'm looking for some solutions, where I can redirect a task/message (MPI_send) to a particular process (say rank 1), which is in a queue (at rank 1) to another process (say rank 2), if the queue is longer at rank 1. How can I do it? First of all, I

Re: [OMPI users] simplest way to check message queues

2010-09-01 Thread Ashley Pittman
padb as a binary (it's a perl script) needs to exist on all nodes as it calls orterun on itself, try installing it to a shared directory or copying padb to /tmp on every node. To access the message queues padb needs a compiled helper program which is installed in $PREFIX/lib so I would recomme

Re: [OMPI users] simplest way to check message queues

2010-09-01 Thread Brock Palen
We have ddt, but we do not have licenses to attach to the number of cores these jobs run at. I tried padb, but it fails, Example: ssh to root node for running MPI job: /tmp/padb -Q -a [nyx0862.engin.umich.edu:25054] [[22211,0],0]-[[25542,0],0] oob-tcp: Communication retries exceeded. Can n

Re: [OMPI users] simplest way to check message queues

2010-09-01 Thread Ashley Pittman
On 1 Sep 2010, at 21:13, Brock Palen wrote: > I have a code for a user (namd if anyone cares) that on a specific case will > lock up, a quick ltrace shows the processes doing Iprobes over and over, so > this makes me think that a process someplace is blocking on communication. > > What is