Having different values is fine for high parameter.
I think the problem comes from using NULL, NULL instead of &argc,
&argv as parameters for MPI_Init. This toy application works for me on
trunk. If you still experience troubles on 1.2, please let us know.
**********************
intercomm_merge_parent.c
#include <stdlib.h>
#include <stdio.h>
#include <mpi.h>
#define NKIDS 3
#define CHECK_MPI_CODE(expr) do { \
int ret = expr; \
if(MPI_SUCCESS != ret) { \
printf("ERROR %d\tat line %d\n", ret, __LINE__); \
return -ret; \
} \
} while(0)
int main(int argc, char *argv[])
{
int errs[NKIDS];
MPI_Comm kids;
MPI_Comm allmpi;
int k;
MPI_Init(&argc, &argv);
printf("Parent Calls MPI_Comm_spawn\n");
CHECK_MPI_CODE( MPI_Comm_spawn("intercomm_merge_child", NULL,
NKIDS, MPI_INFO_NULL, 0, MPI_COMM_WORLD, &kids, errs) );
printf("parent call to MPI_Comm_spawn returns\n");
for (k = 0;k < NKIDS; ++k)
CHECK_MPI_CODE( errs[k] );
printf("parent calls MPI_Intercomm_merge\n");
CHECK_MPI_CODE( MPI_Intercomm_merge( kids, 0, &allmpi) );
printf("parent MPI_Intercomm_merge returns\n");
MPI_Finalize();
return EXIT_SUCCESS;
}
*********************
intercomm_merge_child.c
#include <stdlib.h>
#include <stdio.h>
#include <mpi.h>
#define CHECK_MPI_CODE(expr) do { \
int ret = expr; \
if(MPI_SUCCESS != ret) { \
printf("ERROR %d\tat line %d\n", ret, __LINE__); \
return -ret; \
} \
} while(0)
int main(int argc, char *argv[])
{
MPI_Comm parent;
MPI_Comm allmpi;
fprintf(stderr,"child calls MPI_Init \n");
CHECK_MPI_CODE( MPI_Init(&argc,&argv) );
fprintf(stderr,"child MPI_Init returns\n");
CHECK_MPI_CODE( MPI_Comm_get_parent(&parent) );
fprintf(stderr,"child calls MPI_Intercomm_merge \n");
CHECK_MPI_CODE( MPI_Intercomm_merge( parent, 1, &allmpi) );
fprintf(stderr,"child call to MPI_Intercomm_merge returns\n");
MPI_Finalize();
return EXIT_SUCCESS;
}
Aurelien
Le 28 juil. 08 à 15:52, Mark Borgerding a écrit :
Check.
Parent has high=0
Children have high=1
Jeff Squyres wrote:
Ok, good.
One thing to check is that you have put different values for the
"high" value between the parent group and the children group.
On Jul 28, 2008, at 3:42 PM, Mark Borgerding wrote:
I should've been clearer. I have observed the same behavior under
both those versions.
I was not using the two version in the same cluster.
-- Mark
Jeff Squyres wrote:
Are you mixing both v1.2.4 and v1.2.5 in a single MPI job? That
may have unintended side-effects -- we unfortunately do not
guarantee binary compatibility between any of our releases.
On Jul 28, 2008, at 10:16 AM, Mark Borgerding wrote:
I am using version 1.2.4 (Fedora 9) and 1.2.5 ( CentOS 5.2 )
A little clarification:
The children do not actually wake up when the parent *sends*
data to them, but only after the parent tries to receive data
from the merged intercomm.
Here is the timeline:
...
parent call to MPI_Comm_spawn returns
parent calls MPI_Intercomm_merge
children call to MPI_Init return
children call MPI_Intercomm_merge
parent MPI_Intercomm_merge returns
(long pause inserted via parent sleep)
parent sends data to kid 1
(long pause inserted via parent sleep)
parent starts to receive data from kid 1
all children's calls to MPI_Intercomm_merge return
-- Mark
Aurélien Bouteiller wrote:
Ok, I'll check to see what happens. Which version of Open MPI
are you using ?
Aurelien
Le 27 juil. 08 à 23:13, Mark Borgerding a écrit :
I got something working, but I'm not 100% sure why.
The children woke up and returned from their calls to
MPI_Intercomm_merge only after
the parent used the intercomm to send some data to the
children via MPI_Send.
Mark Borgerding wrote:
Perhaps I am doing something wrong. The childrens' calls to
MPI_Intercomm_merge never return.
Here's the chronology (with 2 children):
parent calls MPI_Init
parent calls MPI_Comm_spawn
child calls MPI_Init
child calls MPI_Init
parent call to MPI_Comm_spawn returns
(long pause inserted)
parent calls MPI_Intercomm_merge
child MPI_Init returns
child calls MPI_Intercomm_merge
child MPI_Init returns
child calls MPI_Intercomm_merge
parent MPI_Intercomm_merge returns
... but the child processes never return from the
MPI_InterComm_merge function.
Here are some code snippets:
############# parent:
MPI_Init(NULL,NULL);
int nkids=2;
int errs[nkids];
MPI_Comm kid;
cerr << "parent calls MPI_Comm_spawn" << endl;
CHECK_MPI_CODE
( MPI_Comm_spawn("test_mpi",NULL,nkids,MPI_INFO_NULL,
0,MPI_COMM_WORLD,&kid,errs) );
cerr << "parent call to MPI_Comm_spawn returns" << endl;
for (k=0;k<nkids;++k)
CHECK_MPI_CODE( errs[k] );
MPI_Comm allmpi;
cerr << "(long pause)" << endl;
sleep(3);
cerr << "parent calls MPI_Intercomm_merge\n";
CHECK_MPI_CODE( MPI_Intercomm_merge( kid, 0, &allmpi) );
cerr << "parent MPI_Intercomm_merge returns\n";
############### child:
fprintf(stderr,"child calls MPI_Init \n");
CHECK_MPI_CODE( MPI_Init(NULL,NULL) );
fprintf(stderr,"child MPI_Init returns\n");
MPI_Comm parent;
CHECK_MPI_CODE( MPI_Comm_get_parent(&parent) );
fprintf(stderr,"child calls MPI_Intercomm_merge \n");
MPI_Comm allmpi;
CHECK_MPI_CODE( MPI_Intercomm_merge( parent, 1, &allmpi) );
fprintf(stderr,"child call to MPI_Intercomm_merge returns\n");
(the above line never gets executed)
Aurélien Bouteiller wrote:
MPI_Intercomm_merge is what you are looking for.
Aurelien
Le 26 juil. 08 à 13:23, Mark Borgerding a écrit :
Okay, so I've gotten a little bit closer.
I'm using MPI_Comm_spawn to start several children
processes. The problem is that the children are in their
own group, separate from the parent (just the like the
documentation says). I want to merge the children's group
with the parent group so I can efficiently Send/Recv data
between them..
Is this possible?
Plan B: I guess if there is no elegant way to merge all
those processes into one group, I can connect sockets and
make intercomms to talk from the parent directly to each
child.
-- Mark
Mark Borgerding wrote:
I am writing a code module that plugs into a larger
application framework. That framework loads my code
module as a shared object.
So I do not control how the first process gets started,
but I still want it to be able to start and participate in
an MPI group.
Here's roughly what I want to happen ( I think):
framework app running (not under my control)
-> framework loads mycode.so shared object into its
process
-> mycode.so starts mpi programs on several hosts
(e.g. via system call to mpiexec )
-> initial mycode.so process participates in the
group he just started (e.g. he shows up in MPI_Comm_group,
can use MPI_Send, MPI_Recv, etc. )
Can this be done?
I am running under Centos 5.2
Thanks,
Mark
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
--
* Dr. Aurélien Bouteiller
* Sr. Research Associate at Innovative Computing Laboratory
* University of Tennessee
* 1122 Volunteer Boulevard, suite 350
* Knoxville, TN 37996
* 865 974 6321
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
--
Mark Borgerding
3dB Labs, Inc
Innovate. Develop. Deliver.
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
--
Mark Borgerding
3dB Labs, Inc
Innovate. Develop. Deliver.
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
--
Mark Borgerding
3dB Labs, Inc
Innovate. Develop. Deliver.
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
--
* Dr. Aurélien Bouteiller
* Sr. Research Associate at Innovative Computing Laboratory
* University of Tennessee
* 1122 Volunteer Boulevard, suite 350
* Knoxville, TN 37996
* 865 974 6321