Finally it was a network problem. I had to disable  one network interface in
the master node of the cluster by setting
btl_tcp_if_include = eth1 on file /usr/local/etc/openmpi-mca-params.conf

thank you all for your help.

Jose Pedro
On 3/1/06, Jose Pedro Garcia Mahedero <jpgmahed...@gmail.com> wrote:
>
> OK, it ALMOST works!!
>
> Now I've install MPI on a non clustered machine and it works, but
> surprisingly, it works fine from machine OUT1 as master to machine CLUSTER1
> as slave, but (here was my surprise) it doesn't work on the other sense! If
> I run the same program with CLUSTER1 as master it only sends one message
> from master to slave and blocks while sending the second message. Maybe it
> is a firewall/iptable  problem.
>
> Does anybody know which ports does MPI use to send requests/responses ot
> how to trace it? What I really don't understand is why it happens at the
> second message and not the first one :-( I know my slave never finishes, but
> It is not intended to right now, it will in a next version, but I think it
> is not the main problem :-S
>
> I send an attachemtn with the (so simple) code and a tarball with my
> config.log
>
> thaks
>
>
> On 3/1/06, Jose Pedro Garcia Mahedero < jpgmahed...@gmail.com> wrote:
> >
> > You're right, I'll try to use netpipes first and then the application.
> > If it doesn't workt I'll send configs and more detailed informations
> >
> > Thank you!
> >
> > On 3/1/06, Brian Barrett <brbar...@open-mpi.org> wrote:
> > >
> > > Jose -
> > >
> > > I noticed that your output doesn't appear to match what the source
> > > code is capable of generating.  It's possible that you're running
> > > into problems with the code that we can't see because you didn't send
> > > a complete version of the source code.
> > >
> > > You might want to start by running some 3rd party codes that are
> > > known to be good, just to make sure that your MPI installation checks
> > > out.  A good start is NetPIPE, which runs between two peers and gives
> > > latency / bandwidth information.  If that runs, then it's time to
> > > look at your application.  If that doesn't run, then it's time to
> > > look at the MPI installation in more detail.  In this case, it would
> > > be useful to see all of the information requested here:
> > >
> > >    http://www.open-mpi.org/community/help/
> > >
> > > as well as from running the mpirun command used to start NetPIPE with
> > > the -d option, so something like:
> > >
> > >    mpirun -np 2 -hostfile foo -d ./NPMpi
> > >
> > > Brian
> > >
> > > On Feb 28, 2006, at 9:29 AM, Jose Pedro Garcia Mahedero wrote:
> > >
> > > > Hello everybody.
> > > >
> > > > I'm new to MPI and I'm having some problems while runnig a simple
> > > > pingpong program in more than one node.
> > > >
> > > > 1.- I followed all the instructions and installed open MPI without
> > > > problems in  a Beowulf cluster.
> > > > 2.-  Ths cluster is working OK and ssh keys are set for not
> > > > password prompting
> > > > 3.- miexec seems to run OK.
> > > > 4.- Now I'm using just 2 nodes: I've tried a simple ping-pong
> > > > application but my master only sends one request!!
> > > > 5.- I reduced the problem by trying to send just two mesages to the
> > > > same node:
> > > >
> > > > int main(int argc, char **argv){
> > > >   int myrank;
> > > >
> > > >   /* Initialize MPI */
> > > >
> > > >   MPI_Init(&argc, &argv);
> > > >
> > > >   /* Find out my identity in the default communicator */
> > > >
> > > >   MPI_Comm_rank(MPI_COMM_WORLD, &myrank);
> > > >   if (myrank == 0) {
> > > >     int work = 100;
> > > >     int count=0;
> > > >     for (int i =0; i < 10; i++){
> > > >       cout << "MASTER IS SLEEPING..." << endl;
> > > >       sleep(3);
> > > >       cout << "MASTER AWAKE WILL SEND["<< count++ << "]:" << work
> > > > << endl;
> > > >        MPI_Send(&work, 1, MPI_INT, 1, WORKTAG,   MPI_COMM_WORLD);
> > > >     }
> > > >   } else {
> > > >       int count =0;
> > > >       int work;
> > > >       MPI_Status status;
> > > >       while (true){
> > > >           MPI_Recv(&work, 1, MPI_INT, 0, MPI_ANY_TAG,
> > > > MPI_COMM_WORLD, &status);
> > > >          cout << "SLAVE[" << myrank << "] RECEIVED[" << count++ <<
> > > > "]:" << work <<endl;
> > > >         if (status.MPI_TAG == DIETAG) {
> > > >           break;
> > > >         }
> > > >     }// while
> > > >   }
> > > >   MPI_Finalize();
> > > >
> > > >
> > > >
> > > > 6a.- RESULTS  (if I put more than one machine in my mpihostsfile),
> > > > my master sends the first message  and my slave receives it
> > > > perfectly. But my master doesnt send its second .
> > > > message:
> > > >
> > > >
> > > >
> > > > Here's my output
> > > >
> > > > MASTER IS SLEEPING...
> > > > MASTER AWAKE WILL SEND[0]:100
> > > > MASTER IS SLEEPING...
> > > > SLAVE[1] RECEIVED[0]:100MPI_STATUS.MPI_ERROR:0
> > > > MASTER AWAKE WILL SEND[1]:100
> > > >
> > > > 6b.- RESULTS (if I put ONLY  1 machine in my mpihostsfile),
> > > > everything is OK until iteration 9!!!
> > > > MASTER IS SLEEPING...
> > > > MASTER AWAKE WILL SEND[0]:100
> > > > MASTER IS SLEEPING...
> > > > MASTER AWAKE WILL SEND[1]:100
> > > > MASTER IS SLEEPING...
> > > > MASTER AWAKE WILL SEND[2]:100
> > > > MASTER IS SLEEPING...
> > > > MASTER AWAKE WILL SEND[3]:100
> > > > MASTER IS SLEEPING...
> > > > MASTER AWAKE WILL SEND[4]:100
> > > > MASTER IS SLEEPING...
> > > > MASTER AWAKE WILL SEND[5]:100
> > > > MASTER IS SLEEPING...
> > > > MASTER AWAKE WILL SEND[6]:100
> > > > MASTER IS SLEEPING...
> > > > MASTER AWAKE WILL SEND[7]:100
> > > > MASTER IS SLEEPING...
> > > > MASTER AWAKE WILL SEND[8]:100
> > > > MASTER IS SLEEPING...
> > > > MASTER AWAKE WILL SEND[9]:100
> > > > SLAVE[1] RECEIVED[0]:100MPI_STATUS.MPI_ERROR:0
> > > > SLAVE[1] RECEIVED[1]:100MPI_STATUS.MPI_ERROR:0
> > > > SLAVE[1] RECEIVED[2]:100MPI_STATUS.MPI_ERROR:0
> > > > SLAVE[1] RECEIVED[3]:100MPI_STATUS.MPI_ERROR:0
> > > > SLAVE[1] RECEIVED[4]:100MPI_STATUS.MPI_ERROR:0
> > > > SLAVE[1] RECEIVED[5]:100MPI_STATUS.MPI_ERROR:0
> > > > SLAVE[1] RECEIVED[6]:100MPI_STATUS.MPI_ERROR:0
> > > > SLAVE[1] RECEIVED[7]:100MPI_STATUS.MPI_ERROR:0
> > > > SLAVE[1] RECEIVED[8]:100MPI_STATUS.MPI_ERROR:0
> > > > SLAVE[1] RECEIVED[9]:100MPI_STATUS.MPI_ERROR:0
> > > > --------------------------------
> > > >
> > > > I know this is a lot of text, but I wanted to give a mamixum
> > > > detailed question. I've been search in FAQ, but still don't know
> > > > what (and why) is going on...
> > > >
> > > > Anyone can help please :-)  ?
> > > >
> > > > _______________________________________________
> > > > users mailing list
> > > > us...@open-mpi.org
> > > > http://www.open-mpi.org/mailman/listinfo.cgi/users
> > >
> > > --
> > >    Brian Barrett
> > >    Open MPI developer
> > >    http://www.open-mpi.org/
> > >
> > >
> > > _______________________________________________
> > > users mailing list
> > > us...@open-mpi.org
> > > http://www.open-mpi.org/mailman/listinfo.cgi/users
> > >
> >
> >
>
>

Reply via email to