Our Mac expert (Brian Barrett) just recently left the project for
greener pastures. He's the guy who typically answered Mac/XGrid
questions -- I'm afraid that I have no idea how any of that XGrid
stuff works... :-(
Is there anyone else around who can answer XGrid questions? Warner?
On Oct 4, 2007, at 11:29 PM, Jinhui Qin wrote:
Hi,
I have set up an Xgrid including one laptop and 7 Mac mini nodes
(all are duo core machines). I have also installed openMPI (openmpi
1.2.1) on all nodes. The laptop node (hostname: sib) has three
roles: agent, controller and client, all the other nodes are only
agents.
When I started "mpirun -n 8 /bin/hostname" on my laptop node
terminal, it shows all 8 nodes' hostnames correctly. It seems that
xgrid works fine.
Then I wanted to run a simple mpi code. The source code "Hello.c"
has been compiled (use mpicc) and the excuatalbe "Hello" has been
copied to each node under same path(I have also tested they all run
properly on each of the local nodes.). when I asked for 1 or 2
processors to run the job, xgrid worked fine, but when I asked for
3 or more processors, all jobs were failed. Following are the
commands and the results/messages that I got.
Can anybody help me out?
*************************************
running "hostname" and the results, they looks good.
*************************************
sib:sharcnet$ mpirun -n 8 /bin/hostname
node2
node8
node4
node5
node3
node7
sib
node6
*************************************
the simple mpi program Hello.c source code
*************************************
#include
#include
int main(int argc, char *argv[]) {
int numprocs, rank, namelen;
char processor_name[MPI_MAX_PROCESSOR_NAME];
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &numprocs);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Get_processor_name(processor_name, &namelen);
printf("Process %d on %s out of %d\n", rank, processor_name,
numprocs);
MPI_Finalize();
}
*************************************
ask for 1 and 2 processors to run "Hello"
and the results are all good
*************************************
sib:sharcnet$ mpirun -n 1 ~/openMPI_sutuff/Hello
Process 0 on sib out of 1
sib:sharcnet$ mpiurun -n 2 ~/openMPI_stuff/Hello
Process 1 on node2 out of 2
Process 0 on sib out of 2
*************************************
Here is the problem when
ask for 3 processors to run the job,
following are all the messages I got
*************************************
sib:sharcnet$ mpirun -n 3 ~/openMPI_stuff/Hello
Process 0.1.1 is unable to reach 0.1.2 for MPI communication.
If you specified the use of a BTL component, you may have
forgotten a component (such as "self") in the list of
usable components.
Process 0.1.2 is unable to reach 0.1.1 for MPI communication.
If you specified the use of a BTL component, you may have
forgotten a component (such as "self") in the list of
usable components.
It looks like MPI_INIT failed for some reason; your parallel
process is
likely to abort. There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or
environment
problems. This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):
PML add procs failed
--> Returned "Unreachable" (-12) instead of "Success" (0)
*** An error occurred in MPI_Init
*** before MPI was initialized
*** MPI_ERRORS_ARE_FATAL (goodbye)
It looks like MPI_INIT failed for some reason; your parallel
process is
likely to abort. There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or
environment
problems. This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):
PML add procs failed
--> Returned "Unreachable" (-12) instead of "Success" (0)
*** An error occurred in MPI_Init
*** before MPI was initialized
*** MPI_ERRORS_ARE_FATAL (goodbye)
mpirun noticed that job rank 0 with PID 817 on node xgrid-node-0
exited on signal 15 (Terminated).
sib:sharcnet$
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
--
Jeff Squyres
Cisco Systems