Hi,
Sorry about that. Two lines got cut out from the program. Here is the
full program and error messages again. No Resource Manager involved,
just ssh/rsh.
Hostfile contains
bmi-opt2-01
bmi-opt2-02
bmi-opt2-03
bmi-opt2-04
############################
#include<string.h>
#include<stdlib.h>
#include<stdio.h>
#include"mpi.h"
void
main(int argc, char **argv)
{
int tag = 0;
int my_rank;
int num_proc;
char message_0[] = "hello slave, i'm your master";
char message_1[50];
char master_data[] = "slaves to work";
int array_of_errcodes[10];
int num;
MPI_Status status;
MPI_Comm inter_comm;
MPI_Info info;
int arr[1];
int rc1;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
MPI_Comm_size(MPI_COMM_WORLD, &num_proc);
printf("MASTER : spawning 3 slaves ... \n");
rc1 = MPI_Comm_spawn("/bin/hostname", MPI_ARGV_NULL, 1,
MPI_INFO_NULL, 0, MPI_COMM_WORLD, &inter_comm, arr);
printf("MASTER : send a message to master of slaves ...\n");
MPI_Send(message_0, 50, MPI_CHAR,0 , tag, inter_comm);
MPI_Recv(message_1, 50, MPI_CHAR, 0, tag, inter_comm, &status);
printf("MASTER : message received : %s\n", message_1);
MPI_Send(master_data, 50, MPI_CHAR,0 , tag, inter_comm);
MPI_Finalize();
exit(0);
}
#################################
prakash@bmi-opt2-01:~/thesis/CS/Samples/x86_64> mpirun -np 1 --pernode
--prefix /usr/local/openmpi-1.2 --hostfile machinefile ./master1
MASTER : spawning 3 slaves ...
src is (null) and orte type is 0
[bmi-opt2-01:03527] [0,0,0] ORTE_ERROR_LOG: Bad parameter in file
dss/dss_copy.c at line 43
[bmi-opt2-01:03527] [0,0,0] ORTE_ERROR_LOG: Bad parameter in file
gpr_replica_put_get_fn.c at line 410
[bmi-opt2-01:03527] [0,0,0] ORTE_ERROR_LOG: Bad parameter in file
base/rmaps_base_registry_fns.c at line 612
[bmi-opt2-01:03527] [0,0,0] ORTE_ERROR_LOG: Bad parameter in file
base/rmaps_base_map_job.c at line 93
[bmi-opt2-01:03527] [0,0,0] ORTE_ERROR_LOG: Bad parameter in file
base/rmaps_base_receive.c at line 139
mpirun: killing job...
mpirun noticed that job rank 0 with PID 3532 on node bmi-opt2-01 exited
on signal 15 (Terminated).
Thanks,
Prakash
r...@lanl.gov 06/03/07 9:31 PM >>>
Hi Prakash
Are you sure the code you provided here is the one generating the output
you
attached? I don't see this message anywhere in your code:
MASTER : spawning 3 slaves ...
and it certainly isn't anything we generate. Also, your output implies
you
are in some kind of loop, yet your code contains only a single
comm_spawn.
Could you please clarify?
Thanks
Ralph
On 6/3/07 5:50 AM, "Prakash Velayutham" <prakash.velayut...@cchmc.org>
wrote:
Hello,
Version - Open MPI 1.2.1.
I have a simple program as below:
#include<string.h>
#include<stdlib.h>
#include<stdio.h>
#include"mpi.h"
void
main(int argc, char **argv)
{
int tag = 0;
int my_rank;
int num_proc;
char message_0[] = "hello slave, i'm your master";
char message_1[50];
char master_data[] = "slaves to work";
int num;
MPI_Status status;
MPI_Comm inter_comm;
MPI_Info info;
int arr[1];
int rc1;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
MPI_Comm_size(MPI_COMM_WORLD, &num_proc);
rc1 = MPI_Comm_spawn("/bin/hostname", MPI_ARGV_NULL, 1,
MPI_INFO_NULL, 0, MPI_COMM_WORLD, &inter_comm, arr);
printf("MASTER : send a message to master of slaves ...\n");
MPI_Send(message_0, 50, MPI_CHAR,0 , tag, inter_comm);
MPI_Recv(message_1, 50, MPI_CHAR, 0, tag, inter_comm,
&status);
printf("MASTER : message received : %s\n", message_1);
MPI_Send(master_data, 50, MPI_CHAR,0 , tag, inter_comm);
MPI_Finalize();
exit(0);
}
When this is run, all I get is
~/thesis/CS/Samples/x86_64> mpirun -np 4 --pernode --hostfile
machinefile --prefix /usr/local/openmpi-1.2 ./master1
MASTER : spawning 3 slaves ...
MASTER : spawning 3 slaves ...
MASTER : spawning 3 slaves ...
MASTER : spawning 3 slaves ...
src is (null) and orte type is 0
[bmi-opt2-01:25441] [0,0,0] ORTE_ERROR_LOG: Bad parameter in file
dss/dss_copy.c at line 43
[bmi-opt2-01:25441] [0,0,0] ORTE_ERROR_LOG: Bad parameter in file
gpr_replica_put_get_fn.c at line 410
[bmi-opt2-01:25441] [0,0,0] ORTE_ERROR_LOG: Bad parameter in file
base/rmaps_base_registry_fns.c at line 612
[bmi-opt2-01:25441] [0,0,0] ORTE_ERROR_LOG: Bad parameter in file
base/rmaps_base_map_job.c at line 93
[bmi-opt2-01:25441] [0,0,0] ORTE_ERROR_LOG: Bad parameter in file
base/rmaps_base_receive.c at line 139
mpirun: killing job...
mpirun noticed that job rank 0 with PID 25447 on node bmi-opt2-01
exited
on signal 15 (Terminated).
3 additional processes aborted (not shown)
Any idea what is wrong with this.
Thanks,
Prakash