Ralph,

Ralph H Castain wrote:
Hmmm...I think I know what may be happening. Could you send me:

1. what Open MPI version you are using?
Open MPI 1.2.1
2. any MCA parameters you might be setting in your environment (remember
that we may be picking up some system configuration file for those)
How do I get these?
This isn't related to the problem, but I also note that you are spawning
"hostname" and then trying to do MPI send/recv with it - I don't think that
is going to work.
I know. I could not start another client code before this. So just wanted to check if /bin/hostname works with the spawn.
Thanks
Ralph
Thanks,
Prakash

On 6/5/07 4:16 AM, "Prakash Velayutham" <prakash.velayut...@cchmc.org>
wrote:
Hi,

Sorry about that. Two lines got cut out from the program. Here is the
full program and error messages again. No Resource Manager involved,
just ssh/rsh.

Hostfile contains

bmi-opt2-01
bmi-opt2-02
bmi-opt2-03
bmi-opt2-04

############################
#include<string.h>
#include<stdlib.h>
#include<stdio.h>
#include"mpi.h"

void
main(int argc, char **argv)
{
        int             tag = 0;
        int             my_rank;
        int             num_proc;
        char            message_0[] = "hello slave, i'm your master";
        char            message_1[50];
        char            master_data[] = "slaves to work";
        int             array_of_errcodes[10];
        int             num;
        MPI_Status      status;
        MPI_Comm        inter_comm;
        MPI_Info        info;
        int             arr[1];
        int             rc1;
        MPI_Init(&argc, &argv);
        MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
        MPI_Comm_size(MPI_COMM_WORLD, &num_proc);
        printf("MASTER : spawning 3 slaves ... \n");
        rc1 = MPI_Comm_spawn("/bin/hostname", MPI_ARGV_NULL, 1,
MPI_INFO_NULL, 0, MPI_COMM_WORLD, &inter_comm, arr);
        printf("MASTER : send a message to master of slaves ...\n");
        MPI_Send(message_0, 50, MPI_CHAR,0 , tag, inter_comm);
        MPI_Recv(message_1, 50, MPI_CHAR, 0, tag, inter_comm, &status);
        printf("MASTER : message received : %s\n", message_1);
        MPI_Send(master_data, 50, MPI_CHAR,0 , tag, inter_comm);
        MPI_Finalize();
        exit(0);
}
#################################

prakash@bmi-opt2-01:~/thesis/CS/Samples/x86_64> mpirun -np 1 --pernode
--prefix /usr/local/openmpi-1.2 --hostfile machinefile ./master1
MASTER : spawning 3 slaves ...
src is (null) and orte type is 0
[bmi-opt2-01:03527] [0,0,0] ORTE_ERROR_LOG: Bad parameter in file
dss/dss_copy.c at line 43
[bmi-opt2-01:03527] [0,0,0] ORTE_ERROR_LOG: Bad parameter in file
gpr_replica_put_get_fn.c at line 410
[bmi-opt2-01:03527] [0,0,0] ORTE_ERROR_LOG: Bad parameter in file
base/rmaps_base_registry_fns.c at line 612
[bmi-opt2-01:03527] [0,0,0] ORTE_ERROR_LOG: Bad parameter in file
base/rmaps_base_map_job.c at line 93
[bmi-opt2-01:03527] [0,0,0] ORTE_ERROR_LOG: Bad parameter in file
base/rmaps_base_receive.c at line 139
mpirun: killing job...

mpirun noticed that job rank 0 with PID 3532 on node bmi-opt2-01 exited
on signal 15 (Terminated).

Thanks,
Prakash
r...@lanl.gov 06/03/07 9:31 PM >>>
Hi Prakash

Are you sure the code you provided here is the one generating the output
you
attached? I don't see this message anywhere in your code:

MASTER : spawning 3 slaves ...

and it certainly isn't anything we generate. Also, your output implies
you
are in some kind of loop, yet your code contains only a single
comm_spawn.

Could you please clarify?

Thanks
Ralph


On 6/3/07 5:50 AM, "Prakash Velayutham" <prakash.velayut...@cchmc.org>
wrote:

Hello,

Version - Open MPI 1.2.1.

I have a simple program as below:

#include<string.h>
#include<stdlib.h>
#include<stdio.h>
#include"mpi.h"

void
main(int argc, char **argv)
{

        int             tag = 0;
        int             my_rank;
        int             num_proc;
        char            message_0[] = "hello slave, i'm your master";
        char            message_1[50];
        char            master_data[] = "slaves to work";
        int             num;
        MPI_Status      status;
        MPI_Comm        inter_comm;
        MPI_Info        info;
        int             arr[1];
        int             rc1;
        MPI_Init(&argc, &argv);
        MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
        MPI_Comm_size(MPI_COMM_WORLD, &num_proc);
        rc1 = MPI_Comm_spawn("/bin/hostname", MPI_ARGV_NULL, 1,
MPI_INFO_NULL, 0, MPI_COMM_WORLD, &inter_comm, arr);
        printf("MASTER : send a message to master of slaves ...\n");
        MPI_Send(message_0, 50, MPI_CHAR,0 , tag, inter_comm);
        MPI_Recv(message_1, 50, MPI_CHAR, 0, tag, inter_comm,
&status);
        printf("MASTER : message received : %s\n", message_1);
        MPI_Send(master_data, 50, MPI_CHAR,0 , tag, inter_comm);
        MPI_Finalize();
        exit(0);
}

When this is run, all I get is
~/thesis/CS/Samples/x86_64> mpirun -np 4 --pernode --hostfile
machinefile --prefix /usr/local/openmpi-1.2 ./master1
MASTER : spawning 3 slaves ...
MASTER : spawning 3 slaves ...
MASTER : spawning 3 slaves ...
MASTER : spawning 3 slaves ...
src is (null) and orte type is 0
[bmi-opt2-01:25441] [0,0,0] ORTE_ERROR_LOG: Bad parameter in file
dss/dss_copy.c at line 43
[bmi-opt2-01:25441] [0,0,0] ORTE_ERROR_LOG: Bad parameter in file
gpr_replica_put_get_fn.c at line 410
[bmi-opt2-01:25441] [0,0,0] ORTE_ERROR_LOG: Bad parameter in file
base/rmaps_base_registry_fns.c at line 612
[bmi-opt2-01:25441] [0,0,0] ORTE_ERROR_LOG: Bad parameter in file
base/rmaps_base_map_job.c at line 93
[bmi-opt2-01:25441] [0,0,0] ORTE_ERROR_LOG: Bad parameter in file
base/rmaps_base_receive.c at line 139
mpirun: killing job...

mpirun noticed that job rank 0 with PID 25447 on node bmi-opt2-01
exited
on signal 15 (Terminated).
3 additional processes aborted (not shown)

Any idea what is wrong with this.

Thanks,
Prakash

Reply via email to