Hi, I have a problem using the MPI_Comm_spawn multiple together with bproc.
I want to use the MPI_Comm_spawn multiple call to spawn a set of exe, but in a bproc environment, the program crashes or is stuck on this call (depending of the used open mpi release). I have created one test program that spawns one other program on the same host (cf. code listing at the end of the mail). * With open mpi 1.1.2, the program crashs on the MPI_Comm_spawn multiple call: <---------------------------------> [myhost:17061] [0,0,0] ORTE_ERROR_LOG: Not available in file ras_bjs.c at line 253 main_exe: Begining of main_exe main_exe: Call MPI_Init main_exe: Call MPI_Comm_spawn_multiple() [myhost:17061] [0,0,0] ORTE_ERROR_LOG: Not available in file ras_bjs.c at line 253 Signal:11 info.si_errno:0(Success) si_code:1(SEGV_MAPERR) Failing at addr:(nil) [0] func:/usr/local/Mpi/openmpi-1.1.2/lib/libopal.so.0 [0xb7f70ccf] [1] func:[0xffffe440] [2] func:/usr/local/Mpi/openmpi-1.1.2/lib/liborte.so.0(orte_schema_base_get_node_tokens+0x7f) [0xb7fdc41f] [3] func:/usr/local/Mpi/openmpi-1.1.2/lib/liborte.so.0(orte_ras_base_node_assign+0x20b) [0xb7fd230b] [4] func:/usr/local/Mpi/openmpi-1.1.2/lib/liborte.so.0(orte_ras_base_allocate_nodes+0x41) [0xb7fd0371] [5] func:/usr/local/Mpi/openmpi-1.1.2/lib/openmpi/mca_ras_hostfile.so [0xb7538ba8] [6] func:/usr/local/Mpi/openmpi-1.1.2/lib/liborte.so.0(orte_ras_base_allocate+0xd0) [0xb7fd0470] [7] func:/usr/local/Mpi/openmpi-1.1.2/lib/openmpi/mca_rmgr_urm.so [0xb754d62f] [8] func:/usr/local/Mpi/openmpi-1.1.2/lib/liborte.so.0(orte_rmgr_base_cmd_dispatch+0x137) [0xb7fd9187] [9] func:/usr/local/Mpi/openmpi-1.1.2/lib/openmpi/mca_rmgr_urm.so [0xb754e09e] [10] func:/usr/local/Mpi/openmpi-1.1.2/lib/liborte.so.0 [0xb7fcd00e] [11] func:/usr/local/Mpi/openmpi-1.1.2/lib/openmpi/mca_oob_tcp.so [0xb7585084] [12] func:/usr/local/Mpi/openmpi-1.1.2/lib/openmpi/mca_oob_tcp.so [0xb7586763] [13] func:/usr/local/Mpi/openmpi-1.1.2/lib/libopal.so.0(opal_event_loop+0x199) [0xb7f5f7a9] [14] func:/usr/local/Mpi/openmpi-1.1.2/lib/libopal.so.0 [0xb7f60353] [15] func:/lib/tls/libpthread.so.0 [0xb7ef7b63] [16] func:/lib/tls/libc.so.6(__clone+0x5a) [0xb7e9518a] *** End of error message *** <-----------------------------------------------> * With open mpi 1.1.1, the program is simply stuck on the MPI_Comm_spawn multiple call: <---------------------------------> [myhost:17187] [0,0,0] ORTE_ERROR_LOG: Not available in file ras_bjs.c at line 253 main_exe: Begining of main_exe main_exe: Call MPI_Init main_exe: Call MPI_Comm_spawn_multiple() [myhost:17187] [0,0,0] ORTE_ERROR_LOG: Not available in file ras_bjs.c at line 253 <---------------------------------> * With open mpi 1.0.2, the program is also stuck on the MPI_Comm_spawn multiple call but there is no ORTE_ERROR_LOG: <---------------------------------> main_exe: Begining of main_exe main_exe: Call MPI_Init main_exe: Call MPI_Comm_spawn_multiple() <---------------------------------> * With open mpi 1.1.2 in a non bproc environment, the program works just fine : <---------------------------------> main_exe: Begining of main_exe main_exe: Call MPI_Init main_exe: Call MPI_Comm_spawn_multiple() spawned_exe: Begining of spawned_exe spawned_exe: Call MPI_Init main_exe: Back from MPI_Comm_spawn_multiple() result = 0 main_exe: Spawned exe returned errcode = 0 spawned_exe: This exe does not do really much thing actually main_exe: Call MPI_finalize main_exe: End of main_exe <---------------------------------> Can you help me to solve this problem ? Regards. Herve The bproc release is: bproc: Beowulf Distributed Process Space Version 4.0.0pre8 bproc: (C) 1999-2003 Erik Hendriks <e...@hendriks.cx> bproc: Initializing node set. node_ct=1 id_ct=1 the system is a debian sarge with a 2.6.9 kernel installed and patched with bproc. Eventually, I provide to you the ompi_info log fot he open mpi 1.1.2 release: Open MPI: 1.1.2 Open MPI SVN revision: r12073 Open RTE: 1.1.2 Open RTE SVN revision: r12073 OPAL: 1.1.2 OPAL SVN revision: r12073 Prefix: /usr/local/Mpi/openmpi-1.1.2 Configured architecture: i686-pc-linux-gnu Configured by: itrsat Configured on: Mon Oct 23 12:55:17 CEST 2006 Configure host: myhost Built by: setics Built on: lun oct 23 13:09:47 CEST 2006 Built host: myhost C bindings: yes C++ bindings: yes Fortran77 bindings: no Fortran90 bindings: no Fortran90 bindings size: na C compiler: gcc C compiler absolute: /usr/bin/gcc C++ compiler: g++ C++ compiler absolute: /usr/bin/g++ Fortran77 compiler: none Fortran77 compiler abs: none Fortran90 compiler: none Fortran90 compiler abs: none C profiling: yes C++ profiling: yes Fortran77 profiling: no Fortran90 profiling: no C++ exceptions: no Thread support: posix (mpi: yes, progress: yes) Internal debug support: no MPI parameter check: runtime Memory profiling support: no Memory debugging support: no libltdl support: yes MCA memory: ptmalloc2 (MCA v1.0, API v1.0, Component v1.1.2) MCA paffinity: linux (MCA v1.0, API v1.0, Component v1.1.2) MCA maffinity: first_use (MCA v1.0, API v1.0, Component v1.1.2) MCA timer: linux (MCA v1.0, API v1.0, Component v1.1.2) MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0) MCA allocator: bucket (MCA v1.0, API v1.0, Component v1.0) MCA coll: basic (MCA v1.0, API v1.0, Component v1.1.2) MCA coll: hierarch (MCA v1.0, API v1.0, Component v1.1.2) MCA coll: self (MCA v1.0, API v1.0, Component v1.1.2) MCA coll: sm (MCA v1.0, API v1.0, Component v1.1.2) MCA coll: tuned (MCA v1.0, API v1.0, Component v1.1.2) MCA io: romio (MCA v1.0, API v1.0, Component v1.1.2) MCA mpool: sm (MCA v1.0, API v1.0, Component v1.1.2) MCA pml: ob1 (MCA v1.0, API v1.0, Component v1.1.2) MCA bml: r2 (MCA v1.0, API v1.0, Component v1.1.2) MCA rcache: rb (MCA v1.0, API v1.0, Component v1.1.2) MCA btl: self (MCA v1.0, API v1.0, Component v1.1.2) MCA btl: sm (MCA v1.0, API v1.0, Component v1.1.2) MCA btl: tcp (MCA v1.0, API v1.0, Component v1.0) MCA topo: unity (MCA v1.0, API v1.0, Component v1.1.2) MCA osc: pt2pt (MCA v1.0, API v1.0, Component v1.0) MCA gpr: null (MCA v1.0, API v1.0, Component v1.1.2) MCA gpr: proxy (MCA v1.0, API v1.0, Component v1.1.2) MCA gpr: replica (MCA v1.0, API v1.0, Component v1.1.2) MCA iof: proxy (MCA v1.0, API v1.0, Component v1.1.2) MCA iof: svc (MCA v1.0, API v1.0, Component v1.1.2) MCA ns: proxy (MCA v1.0, API v1.0, Component v1.1.2) MCA ns: replica (MCA v1.0, API v1.0, Component v1.1.2) MCA oob: tcp (MCA v1.0, API v1.0, Component v1.0) MCA ras: bjs (MCA v1.0, API v1.0, Component v1.1.2) MCA ras: dash_host (MCA v1.0, API v1.0, Component v1.1.2) MCA ras: hostfile (MCA v1.0, API v1.0, Component v1.1.2) MCA ras: localhost (MCA v1.0, API v1.0, Component v1.1.2) MCA ras: lsf_bproc (MCA v1.0, API v1.0, Component v1.1.2) MCA ras: poe (MCA v1.0, API v1.0, Component v1.1.2) MCA ras: slurm (MCA v1.0, API v1.0, Component v1.1.2) MCA rds: hostfile (MCA v1.0, API v1.0, Component v1.1.2) MCA rds: resfile (MCA v1.0, API v1.0, Component v1.1.2) MCA rmaps: round_robin (MCA v1.0, API v1.0, Component v1.1.2) MCA rmgr: proxy (MCA v1.0, API v1.0, Component v1.1.2) MCA rmgr: urm (MCA v1.0, API v1.0, Component v1.1.2) MCA rml: oob (MCA v1.0, API v1.0, Component v1.1.2) MCA pls: bproc (MCA v1.0, API v1.0, Component v1.1.2) MCA pls: bproc_orted (MCA v1.0, API v1.0, Component v1.1.2) MCA pls: fork (MCA v1.0, API v1.0, Component v1.1.2) MCA pls: rsh (MCA v1.0, API v1.0, Component v1.1.2) MCA pls: slurm (MCA v1.0, API v1.0, Component v1.1.2) MCA sds: bproc (MCA v1.0, API v1.0, Component v1.1.2) MCA sds: env (MCA v1.0, API v1.0, Component v1.1.2) MCA sds: pipe (MCA v1.0, API v1.0, Component v1.1.2) MCA sds: seed (MCA v1.0, API v1.0, Component v1.1.2) MCA sds: singleton (MCA v1.0, API v1.0, Component v1.1.2) MCA sds: slurm (MCA v1.0, API v1.0, Component v1.1.2) MCA soh: bproc (MCA v1.0, API v1.0, Component v1.1.2) Here below, the code listings: * main_exe.c <-------------------------------------------------------------------> #include "mpi.h" #include <stdlib.h> #include <stdio.h> #include <unistd.h> int gethostname(char *nom, size_t lg); int main( int argc, char **argv ) { /* * MPI_Comm_spawn_multiple parameters */ int result, count, root; int maxprocs; char **commands; MPI_Info infos; int errcodes; MPI_Comm intercomm, newintracomm; int rank; char hostname[80]; int len; printf( "main_exe: Begining of main_exe\n"); printf( "main_exe: Call MPI_Init\n"); MPI_Init( &argc, &argv ); MPI_Comm_rank( MPI_COMM_WORLD, &rank ); /* * MPI_Comm_spawn_multiple parameters */ count = 1; maxprocs = 1; root = rank; commands = malloc (sizeof (char *)); commands[0] = calloc (80, sizeof (char )); sprintf (commands[0], "./spawned_exe"); MPI_Info_create( &infos ); /* set proc/cpu info */ result = MPI_Info_set( infos, "soft", "0:1" ); /* set host info */ result = gethostname ( hostname, len); if ( -1 == result ) { printf ("main_exe: Problem in gethostname\n"); } result = MPI_Info_set( infos, "host", hostname ); printf( "main_exe: Call MPI_Comm_spawn_multiple()\n"); result = MPI_Comm_spawn_multiple( count, commands, MPI_ARGVS_NULL, &maxprocs, &infos, root, MPI_COMM_WORLD, &intercomm, &errcodes ); printf( "main_exe: Back from MPI_Comm_spawn_multiple() result = %d\n", result); printf( "main_exe: Spawned exe returned errcode = %d\n", errcodes ); MPI_Intercomm_merge( intercomm, 0, &newintracomm ); /* Synchronisation with spawned exe */ MPI_Barrier( newintracomm ); free( commands[0] ); free( commands ); MPI_Comm_free( &newintracomm ); printf( "main_exe: Call MPI_finalize\n"); MPI_Finalize( ); printf( "main_exe: End of main_exe\n"); return 0; } <-------------------------------------------------------------------> * spawned_exe.c <-------------------------------------------------------------------> #include "mpi.h" #include <stdio.h> int main( int argc, char **argv ) { MPI_Comm parent, newintracomm; printf ("spawned_exe: Begining of spawned_exe\n"); printf( "spawned_exe: Call MPI_Init\n"); MPI_Init( &argc, &argv ); MPI_Comm_get_parent ( &parent ); MPI_Intercomm_merge ( parent, 1, &newintracomm ); printf( "spawned_exe: This exe does not do really much thing actually\n" ); /* Synchronisation with main exe */ MPI_Barrier( newintracomm ); MPI_Comm_free( &newintracomm ); printf( "spawned_exe: Call MPI_finalize\n"); MPI_Finalize( ); printf( "spawned_exe: End of spawned_exe\n"); return 0; } <-------------------------------------------------------------------> --------------------- ALICE SECURITE ENFANTS --------------------- Protégez vos enfants des dangers d'Internet en installant Sécurité Enfants, le contrôle parental d'Alice. http://www.aliceadsl.fr/securitepc/default_copa.asp