[OMPI users] Setup of a two nodes cluster?

Xianglong Kong Thu, 24 Feb 2011 11:09:00 -0500

Hi, all,

I asked for help for a code problem here days ago (
http://www.open-mpi.org/community/lists/users/2011/02/15656.php ).
Then I found that the code can be executed without any issue on other
cluster. So I suspected that there maybe something wrong in my cluster
environment configuration. So I reconfigured the nfs,ssh and other
related thing and reinstalled the openmpi library. The cluster
consists of two desktops which are connected using a crossover cable.
Both of the desktops have a Intel Core 2 Duo CPU and are using Ubuntu
10.04 LTS, and the version of openmpi intalled on the nfs (located at
the master node)   is 1.4.3.


Now, things seems to be getting worse. I can't run any code
successfully that more complicated than the "MPI hello world". But if
all of the processes are launched in the same node, the code  can be
executed without any issue.

For example, the following code(only add one line to the "MPI hello
world") would crash at the MPI_Barrier. However, if I delete the line
of MPI_Barrier, the code would run successfully.
****************************************************************************************************
#include <stdio.h>
#include "mpi.h"

int main(int argc, char** argv) {

        int myrank, nprocs;

        MPI_Init(&argc, &argv);
        MPI_Comm_size(MPI_COMM_WORLD, &nprocs);
        MPI_Comm_rank(MPI_COMM_WORLD, &myrank);

        printf("First hello from processor %d of %d\n", myrank, nprocs);

        MPI_Barrier(MPI_COMM_WORLD);

        printf("Second hello from processor %d of %d\n", myrank, nprocs);

        MPI_Finalize();
        return 0;
}
****************************************************************************************************

The output of the above code is:
****************************************************************************************************
[kongdragon-master:16119] *** An error occurred in MPI_Barrier
[kongdragon-master:16119] *** on communicator MPI_COMM_WORLD
[kongdragon-master:16119] *** MPI_ERR_IN_STATUS: error code in status
[kongdragon-master:16119] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
First hello from processor 0 of 2
--------------------------------------------------------------------------
mpirun has exited due to process rank 0 with PID 16119 on
node kongdragon-master exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------
First hello from processor 1 of 2

****************************************************************************************************

Can anyone help to point out why things didn't work?

Thanks!


Kong

[OMPI users] Setup of a two nodes cluster?

Reply via email to