Thank you for you quick reply Ralf, As far as I know, the NODES environment variable is created when a job is submitted to the bjs scheduler. The only way I know (but I am a bproc newbe) is to use the bjssub command.
Then, I have retried my test with the following running command: "bjssub -i mpirun -np 1 main_exe". This time the log was: <--------------------------------> main_exe: Begining of main_exe main_exe: Call MPI_Init main_exe: Call MPI_Comm_spawn_multiple() -------------------------------------------------------------------------- Some of the requested hosts are not included in the current allocation for the application: ./spawned_exe The requested hosts were: myhost Verify that you have mapped the allocated resources properly using the --host specification. -------------------------------------------------------------------------- [myhost:22901] [0,0,0] ORTE_ERROR_LOG: Out of resource in file base/rmaps_base_node.c at line 210 [myhost:22901] [0,0,0] ORTE_ERROR_LOG: Out of resource in file rmaps_rr.c at line 331 <------------------------> I guess, this problem comes from the way I set the parameters to the spawned program. Instead of giving instructions to spawn the program on a specific host, I should set parameters to spawn the program on a specific node. But I do not know how to do it. Here below, the source code associatde to the MPI_Info setting for the spawn call: <------------------> /* set proc/cpu info */ result = MPI_Info_set( infos, "soft", "0:1" ); /* set host info */ result = gethostname ( hostname, len); if ( -1 == result ) { printf ("main_exe: Problem in gethostname\n"); } result = MPI_Info_set( infos, "host", hostname ); <------------------> I have tried to replace the line: result = MPI_Info_set( infos, "host", hostname ); with something like: result = MPI_Info_set( infos, "node", "0" ); but with this change, main_exe remains stuck on the MPI_Comm_spawn_multiple call. Then, I have a bunch of questions: - when mpi is used together with bproc, is it necessary to use bjssub or bjs in general ? - I was wondering if I had to submit to bjs the spawned program ? i.e do I have to add 'bjssub' to the commands parameter of the MPI_Comm_spawn_mutliple call ? As you can see, I am still not able to spawn a program and need some more help ? Do you have a some examples describing how to do it ? Regards. Herve List-Post: users@lists.open-mpi.org Date: Mon, 30 Oct 2006 09:00:47 -0700 From: Ralph H Castain <r...@lanl.gov> Subject: Re: [OMPI users] MPI_Comm_spawn multiple bproc support problem To: "Open MPI Users <us...@open-mpi.org>" <us...@open-mpi.org> Message-ID: <c16b6fbf.570d%...@lanl.gov> Content-Type: text/plain; charset="ISO-8859-1" On 1.1.2, what that error is telling you is that it didn't find any nodes in the environment. The bproc allocator looks for an environmental variable NODES that contains a list of nodes assigned to you. This error indicates it didn't find anything. Did you get an allocation prior to running the job? Could you check to see if NODES appears in your environment? Ralph On 10/30/06 8:47 AM, "hpe...@infonie.fr" <hpe...@infonie.fr> wrote: > Hi, > I have a problem using the MPI_Comm_spawn multiple together with bproc. > > I want to use the MPI_Comm_spawn multiple call to spawn a set of exe, but in a > bproc environment, the program crashes or is stuck on this call (depending of > the used open mpi release). > > I have created one test program that spawns one other program on the same host > (cf. code listing at the end of the mail). > > * With open mpi 1.1.2, the program crashs on the MPI_Comm_spawn multiple call: > <---------------------------------> > [myhost:17061] [0,0,0] ORTE_ERROR_LOG: Not available in file ras_bjs.c at line > 253 > main_exe: Begining of main_exe > main_exe: Call MPI_Init > main_exe: Call MPI_Comm_spawn_multiple() > [myhost:17061] [0,0,0] ORTE_ERROR_LOG: Not available in file ras_bjs.c at line > 253 > Signal:11 info.si_errno:0(Success) si_code:1(SEGV_MAPERR) > Failing at addr:(nil) > [0] func:/usr/local/Mpi/openmpi-1.1.2/lib/libopal.so.0 [0xb7f70ccf] > [1] func:[0xffffe440] > [2] > func:/usr/local/Mpi/openmpi-1.1.2/lib/liborte.so.0(orte_schema_base_get_node_t > okens+0x7f) [0xb7fdc41f] > [3] > func:/usr/local/Mpi/openmpi-1.1.2/lib/liborte.so.0(orte_ras_base_node_assign+0 > x20b) [0xb7fd230b] > [4] > func:/usr/local/Mpi/openmpi-1.1.2/lib/liborte.so.0(orte_ras_base_allocate_node > s+0x41) [0xb7fd0371] > [5] func:/usr/local/Mpi/openmpi-1.1.2/lib/openmpi/mca_ras_hostfile.so > [0xb7538ba8] > [6] > func:/usr/local/Mpi/openmpi-1.1.2/lib/liborte.so.0(orte_ras_base_allocate+0xd0 > ) [0xb7fd0470] > [7] func:/usr/local/Mpi/openmpi-1.1.2/lib/openmpi/mca_rmgr_urm.so [0xb754d62f] > [8] > func:/usr/local/Mpi/openmpi-1.1.2/lib/liborte.so.0(orte_rmgr_base_cmd_dispatch > +0x137) [0xb7fd9187] > [9] func:/usr/local/Mpi/openmpi-1.1.2/lib/openmpi/mca_rmgr_urm.so [0xb754e09e] > [10] func:/usr/local/Mpi/openmpi-1.1.2/lib/liborte.so.0 [0xb7fcd00e] > [11] func:/usr/local/Mpi/openmpi-1.1.2/lib/openmpi/mca_oob_tcp.so [0xb7585084] > [12] func:/usr/local/Mpi/openmpi-1.1.2/lib/openmpi/mca_oob_tcp.so [0xb7586763] > [13] func:/usr/local/Mpi/openmpi-1.1.2/lib/libopal.so.0(opal_event_loop+0x199) > [0xb7f5f7a9] > [14] func:/usr/local/Mpi/openmpi-1.1.2/lib/libopal.so.0 [0xb7f60353] > [15] func:/lib/tls/libpthread.so.0 [0xb7ef7b63] > [16] func:/lib/tls/libc.so.6(__clone+0x5a) [0xb7e9518a] > *** End of error message *** > <-----------------------------------------------> > > * With open mpi 1.1.1, the program is simply stuck on the MPI_Comm_spawn > multiple call: > <---------------------------------> > [myhost:17187] [0,0,0] ORTE_ERROR_LOG: Not available in file ras_bjs.c at line > 253 > main_exe: Begining of main_exe > main_exe: Call MPI_Init > main_exe: Call MPI_Comm_spawn_multiple() > [myhost:17187] [0,0,0] ORTE_ERROR_LOG: Not available in file ras_bjs.c at line > 253 > <---------------------------------> > > * With open mpi 1.0.2, the program is also stuck on the MPI_Comm_spawn > multiple call but there is no ORTE_ERROR_LOG: > <---------------------------------> > main_exe: Begining of main_exe > main_exe: Call MPI_Init > main_exe: Call MPI_Comm_spawn_multiple() > <---------------------------------> > > > * With open mpi 1.1.2 in a non bproc environment, the program works just fine > : > <---------------------------------> > main_exe: Begining of main_exe > main_exe: Call MPI_Init > main_exe: Call MPI_Comm_spawn_multiple() > spawned_exe: Begining of spawned_exe > spawned_exe: Call MPI_Init > main_exe: Back from MPI_Comm_spawn_multiple() result = 0 > main_exe: Spawned exe returned errcode = 0 > spawned_exe: This exe does not do really much thing actually > main_exe: Call MPI_finalize > main_exe: End of main_exe > <---------------------------------> > > Can you help me to solve this problem ? > > Regards. > > Herve > > > The bproc release is: > bproc: Beowulf Distributed Process Space Version 4.0.0pre8 > bproc: (C) 1999-2003 Erik Hendriks <e...@hendriks.cx> > bproc: Initializing node set. node_ct=1 id_ct=1 > > the system is a debian sarge with a 2.6.9 kernel installed and patched with > bproc. > > Eventually, I provide to you the ompi_info log fot he open mpi 1.1.2 release: > Open MPI: 1.1.2 > Open MPI SVN revision: r12073 > Open RTE: 1.1.2 > Open RTE SVN revision: r12073 > OPAL: 1.1.2 > OPAL SVN revision: r12073 > Prefix: /usr/local/Mpi/openmpi-1.1.2 > Configured architecture: i686-pc-linux-gnu > Configured by: itrsat > Configured on: Mon Oct 23 12:55:17 CEST 2006 > Configure host: myhost > Built by: setics > Built on: lun oct 23 13:09:47 CEST 2006 > Built host: myhost > C bindings: yes > C++ bindings: yes > Fortran77 bindings: no > Fortran90 bindings: no > Fortran90 bindings size: na > C compiler: gcc > C compiler absolute: /usr/bin/gcc > C++ compiler: g++ > C++ compiler absolute: /usr/bin/g++ > Fortran77 compiler: none > Fortran77 compiler abs: none > Fortran90 compiler: none > Fortran90 compiler abs: none > C profiling: yes > C++ profiling: yes > Fortran77 profiling: no > Fortran90 profiling: no > C++ exceptions: no > Thread support: posix (mpi: yes, progress: yes) > Internal debug support: no > MPI parameter check: runtime > Memory profiling support: no > Memory debugging support: no > libltdl support: yes > MCA memory: ptmalloc2 (MCA v1.0, API v1.0, Component v1.1.2) > MCA paffinity: linux (MCA v1.0, API v1.0, Component v1.1.2) > MCA maffinity: first_use (MCA v1.0, API v1.0, Component v1.1.2) > MCA timer: linux (MCA v1.0, API v1.0, Component v1.1.2) > MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0) > MCA allocator: bucket (MCA v1.0, API v1.0, Component v1.0) > MCA coll: basic (MCA v1.0, API v1.0, Component v1.1.2) > MCA coll: hierarch (MCA v1.0, API v1.0, Component v1.1.2) > MCA coll: self (MCA v1.0, API v1.0, Component v1.1.2) > MCA coll: sm (MCA v1.0, API v1.0, Component v1.1.2) > MCA coll: tuned (MCA v1.0, API v1.0, Component v1.1.2) > MCA io: romio (MCA v1.0, API v1.0, Component v1.1.2) > MCA mpool: sm (MCA v1.0, API v1.0, Component v1.1.2) > MCA pml: ob1 (MCA v1.0, API v1.0, Component v1.1.2) > MCA bml: r2 (MCA v1.0, API v1.0, Component v1.1.2) > MCA rcache: rb (MCA v1.0, API v1.0, Component v1.1.2) > MCA btl: self (MCA v1.0, API v1.0, Component v1.1.2) > MCA btl: sm (MCA v1.0, API v1.0, Component v1.1.2) > MCA btl: tcp (MCA v1.0, API v1.0, Component v1.0) > MCA topo: unity (MCA v1.0, API v1.0, Component v1.1.2) > MCA osc: pt2pt (MCA v1.0, API v1.0, Component v1.0) > MCA gpr: null (MCA v1.0, API v1.0, Component v1.1.2) > MCA gpr: proxy (MCA v1.0, API v1.0, Component v1.1.2) > MCA gpr: replica (MCA v1.0, API v1.0, Component v1.1.2) > MCA iof: proxy (MCA v1.0, API v1.0, Component v1.1.2) > MCA iof: svc (MCA v1.0, API v1.0, Component v1.1.2) > MCA ns: proxy (MCA v1.0, API v1.0, Component v1.1.2) > MCA ns: replica (MCA v1.0, API v1.0, Component v1.1.2) > MCA oob: tcp (MCA v1.0, API v1.0, Component v1.0) > MCA ras: bjs (MCA v1.0, API v1.0, Component v1.1.2) > MCA ras: dash_host (MCA v1.0, API v1.0, Component v1.1.2) > MCA ras: hostfile (MCA v1.0, API v1.0, Component v1.1.2) > MCA ras: localhost (MCA v1.0, API v1.0, Component v1.1.2) > MCA ras: lsf_bproc (MCA v1.0, API v1.0, Component v1.1.2) > MCA ras: poe (MCA v1.0, API v1.0, Component v1.1.2) > MCA ras: slurm (MCA v1.0, API v1.0, Component v1.1.2) > MCA rds: hostfile (MCA v1.0, API v1.0, Component v1.1.2) > MCA rds: resfile (MCA v1.0, API v1.0, Component v1.1.2) > MCA rmaps: round_robin (MCA v1.0, API v1.0, Component v1.1.2) > MCA rmgr: proxy (MCA v1.0, API v1.0, Component v1.1.2) > MCA rmgr: urm (MCA v1.0, API v1.0, Component v1.1.2) > MCA rml: oob (MCA v1.0, API v1.0, Component v1.1.2) > MCA pls: bproc (MCA v1.0, API v1.0, Component v1.1.2) > MCA pls: bproc_orted (MCA v1.0, API v1.0, Component v1.1.2) > MCA pls: fork (MCA v1.0, API v1.0, Component v1.1.2) > MCA pls: rsh (MCA v1.0, API v1.0, Component v1.1.2) > MCA pls: slurm (MCA v1.0, API v1.0, Component v1.1.2) > MCA sds: bproc (MCA v1.0, API v1.0, Component v1.1.2) > MCA sds: env (MCA v1.0, API v1.0, Component v1.1.2) > MCA sds: pipe (MCA v1.0, API v1.0, Component v1.1.2) > MCA sds: seed (MCA v1.0, API v1.0, Component v1.1.2) > MCA sds: singleton (MCA v1.0, API v1.0, Component v1.1.2) > MCA sds: slurm (MCA v1.0, API v1.0, Component v1.1.2) > MCA soh: bproc (MCA v1.0, API v1.0, Component v1.1.2) > > Here below, the code listings: > * main_exe.c > <-------------------------------------------------------------------> > #include "mpi.h" > #include <stdlib.h> > #include <stdio.h> > #include <unistd.h> > int gethostname(char *nom, size_t lg); > > int main( int argc, char **argv ) { > > /* > * MPI_Comm_spawn_multiple parameters > */ > int result, count, root; > int maxprocs; > char **commands; > MPI_Info infos; > int errcodes; > > MPI_Comm intercomm, newintracomm; > int rank; > char hostname[80]; > int len; > > printf( "main_exe: Begining of main_exe\n"); > printf( "main_exe: Call MPI_Init\n"); > MPI_Init( &argc, &argv ); > MPI_Comm_rank( MPI_COMM_WORLD, &rank ); > > /* > * MPI_Comm_spawn_multiple parameters > */ > count = 1; > maxprocs = 1; > root = rank; > > commands = malloc (sizeof (char *)); > commands[0] = calloc (80, sizeof (char )); > sprintf (commands[0], "./spawned_exe"); > > MPI_Info_create( &infos ); > > /* set proc/cpu info */ > result = MPI_Info_set( infos, "soft", "0:1" ); > > /* set host info */ > result = gethostname ( hostname, len); > if ( -1 == result ) { > printf ("main_exe: Problem in gethostname\n"); > } > result = MPI_Info_set( infos, "host", hostname ); > > printf( "main_exe: Call MPI_Comm_spawn_multiple()\n"); > result = MPI_Comm_spawn_multiple( count, > commands, > MPI_ARGVS_NULL, > &maxprocs, > &infos, > root, > MPI_COMM_WORLD, > &intercomm, > &errcodes ); > printf( "main_exe: Back from MPI_Comm_spawn_multiple() result = %d\n", > result); > printf( "main_exe: Spawned exe returned errcode = %d\n", errcodes ); > > MPI_Intercomm_merge( intercomm, 0, &newintracomm ); > > /* Synchronisation with spawned exe */ > MPI_Barrier( newintracomm ); > > free( commands[0] ); > free( commands ); > MPI_Comm_free( &newintracomm ); > > printf( "main_exe: Call MPI_finalize\n"); > MPI_Finalize( ); > > printf( "main_exe: End of main_exe\n"); > return 0; > } > > <-------------------------------------------------------------------> > > * spawned_exe.c > <-------------------------------------------------------------------> > > #include "mpi.h" > #include <stdio.h> > > int main( int argc, char **argv ) { > MPI_Comm parent, newintracomm; > > printf ("spawned_exe: Begining of spawned_exe\n"); > printf( "spawned_exe: Call MPI_Init\n"); > MPI_Init( &argc, &argv ); > > MPI_Comm_get_parent ( &parent ); > MPI_Intercomm_merge ( parent, 1, &newintracomm ); > > printf( "spawned_exe: This exe does not do really much thing actually\n" > ); > > /* Synchronisation with main exe */ > MPI_Barrier( newintracomm ); > > MPI_Comm_free( &newintracomm ); > > printf( "spawned_exe: Call MPI_finalize\n"); > MPI_Finalize( ); > > printf( "spawned_exe: End of spawned_exe\n"); > return 0; > } > > <-------------------------------------------------------------------> > --------------------- ALICE SECURITE ENFANTS --------------------- Protégez vos enfants des dangers d'Internet en installant Sécurité Enfants, le contrôle parental d'Alice. http://www.aliceadsl.fr/securitepc/default_copa.asp