Hi,
i want to build a cluster with openmpi.
2 nodes:
node 1: 4 x Amd Quad Core, ubuntu 9.04, openmpi 1.3.2
node 2: Sony PS3, ubuntu 9.04, openmpi 1.3
both can connect with ssh to each other and to itself without passwd.
I can run the sample proramm pi.c on both nodes seperatly (see below). But if i
try to start it on node1 with --hostfile option to use node 2 "remote" i got
this error:
cluster@bioclust:~$ mpirun --hostfile /etc/openmpi/openmpi-default-hostfile -np
17 /mnt/projects/PS3Cluster/Benchmark/pi
--------------------------------------------------------------------------
mpirun noticed that the job aborted, but has no info as to the process
that caused that situation.
--------------------------------------------------------------------------
my hostfile:
cluster@bioclust:~$ cat /etc/openmpi/openmpi-default-hostfile
10.4.23.107 slots=16
10.4.1.23 slots=2
i can see with top that the processors of node2 begin to work shortly, then it
apports on node1.
I use this sample/test program:
#include <stdio.h>
#include <stdlib.h>
#include "mpi.h"
int main(int argc, char *argv[])
{
int i, n;
double h, pi, x;
int me, nprocs;
double piece;
/* --------------------------------------------------- */
MPI_Init (&argc, &argv);
MPI_Comm_size (MPI_COMM_WORLD, &nprocs);
MPI_Comm_rank (MPI_COMM_WORLD, &me);
/* --------------------------------------------------- */
if (me == 0)
{
printf("%s", "Input number of intervals:\n");
scanf ("%d", &n);
}
/* --------------------------------------------------- */
MPI_Bcast (&n, 1, MPI_INT, 0, MPI_COMM_WORLD);
/* --------------------------------------------------- */
h = 1. / (double) n;
piece = 0.;
for (i=me+1; i <= n; i+=nprocs)
{
x = (i-1)*h;
piece = piece + ( 4/(1+(x)*(x)) + 4/(1+(x+h)*(x+h))) / 2 * h;
}
printf("%d: pi = %25.15f\n", me, piece);
/* --------------------------------------------------- */
MPI_Reduce (&piece, &pi, 1, MPI_DOUBLE,
MPI_SUM, 0, MPI_COMM_WORLD);
/* --------------------------------------------------- */
if (me == 0)
{
printf("pi = %25.15f\n", pi);
}
/* --------------------------------------------------- */
MPI_Finalize();
return 0;
}
it works on each node.
node1:
cluster@bioclust:~$ mpirun -np 4 /mnt/projects/PS3Cluster/Benchmark/piInput
number of intervals:
20
0: pi = 0.822248040052981
2: pi = 0.773339953424083
3: pi = 0.747089984650041
1: pi = 0.798498008827023
pi = 3.141175986954128
node2:
cluster@kasimir:~$ mpirun -np 2 /mnt/projects/PS3Cluster/Benchmark/pi
Input number of intervals:
5
1: pi = 1.267463056905495
0: pi = 1.867463056905495
pi = 3.134926113810990
cluster@kasimir:~$
Thx in advance,
Laurin