I've found that I always have to use mpirun to start my spawner process, due to the exact problem you are having: the need to give OMPI a hosts file! It seems the singleton functionality is lacking somehow... it won't allow you to spawn on arbitrary hosts. I have not tested if this is fixed in the 1.3 series.
Try mpiexec -np 1 -H op2-1,op2-2 spawner op2-2 mpiexec should start the first process on op2-1, and the spawn call should start the second on op2-2. If you don't use the Info object to set the hostname specifically, then on 1.2.x it will automatically start on op2-2. With 1.3, the spawn call will start processes starting with the first item in the host list. mch 2008/7/29 Mark Borgerding <ma...@3db-labs.com>: > Yes. The host names are listed in the host file. > e.g. > "op2-1 slots=8" > and there is an IP address for op2-1 in the /etc/hosts directory > I've read the FAQ. Everything in there seems to assume I am starting the > process group with mpirun or one of its brothers. This is not the case . > > I've created and attached a sample source file that demonstrates my problem. > It participates in a MPI_Group in one of two ways: either from mpiexec or > via MPI_Comm_spawn > > Case 1 works: I can run it on the remote node op2-1 by using mpiexec > mpiexec -np 3 -H op2-1 spawner > > Case 2 works: I can run it on the current host with MPI_Comm_spawn > ./spawner `hostname` > > Case 3 does not work: I cannot use MPI_Comm_spawn to start a group on a > remote node. > ./spawner op2-1 > > The output from case 3 is: > <QUOTE> > I am going to spawn 2 children on op2-1 > -------------------------------------------------------------------------- > Some of the requested hosts are not included in the current allocation for > the > application: > ./spawner > The requested hosts were: > op2-1 > > Verify that you have mapped the allocated resources properly using the > --host specification. > -------------------------------------------------------------------------- > [gardner:32745] [0,0,0] ORTE_ERROR_LOG: Out of resource in file > base/rmaps_base_support_fns.c at line 225 > [gardner:32745] [0,0,0] ORTE_ERROR_LOG: Out of resource in file rmaps_rr.c > at line 478 > [gardner:32745] [0,0,0] ORTE_ERROR_LOG: Out of resource in file > base/rmaps_base_map_job.c at line 210 > [gardner:32745] [0,0,0] ORTE_ERROR_LOG: Out of resource in file rmgr_urm.c > at line 372 > [gardner:32745] [0,0,0] ORTE_ERROR_LOG: Out of resource in file > communicator/comm_dyn.c at line 608 > > </QUOTE> > > Ralph Castain wrote: >> >> OMPI doesn't care what your hosts are named - many of us use names that >> have no numeric pattern or any other discernible pattern to them. >> >> OMPI_MCA_rds_hostfile should point to a file that contains a list of the >> hosts - have you ensured that it does, and that the hostfile format is >> correct? Check the FAQ on the open-mpi.org site: >> >> http://www.open-mpi.org/faq/?category=running#simple-spmd-run >> >> There are several explanations there pertaining to hostfiles. >> >> >> On Jul 29, 2008, at 11:57 AM, Mark Borgerding wrote: >> >>> I listed the node names in the path named in ompi_info --param rds >>> hostfile -- no luck. >>> I also tried copying that file to another location and setting >>> OMPI_MCA_rds_hostfile_path -- no luck. >>> >>> The remote hosts are named op2-1 and op2-2. Could this be another case >>> of the problem I saw a few days ago where the hostnames were assumed to >>> contain a numeric pattern? >>> >>> -- Mark >>> >>> >>> >>> Ralph Castain wrote: >>>> >>>> For the 1.2 release, I believe you will find the enviro param is >>>> OMPI_MCA_rds_hostfile_path - you can check that with "ompi_info". >>>> >>>> >>>> On Jul 29, 2008, at 11:10 AM, Mark Borgerding wrote: >>>> >>>>> Umm ... what -hostfile file? >>>>> >>>>> I am not starting anything via mpiexec/orterun so there is no >>>>> "-hostfile" argument AFAIK. >>>>> Is there some other way to communicate this? An environment variable or >>>>> mca param? >>>>> >>>>> >>>>> -- Mark >>>>> >>>>> >>>>> Ralph Castain wrote: >>>>>> >>>>>> Are the hosts where you want the children to go in your -hostfile >>>>>> file? All of the hosts you intend to use have to be in that file, even if >>>>>> they don't get used until the comm_spawn. >>>>>> >>>>>> >>>>>> On Jul 29, 2008, at 9:08 AM, Mark Borgerding wrote: >>>>>> >>>>>>> I've tried lots of different values for the "host" key in the info >>>>>>> handle. >>>>>>> I've tried hardcoding the hostname+ip entries in the /etc/hosts file >>>>>>> -- no luck. I cannot get my MPI_Comm_spawn children to go anywhere >>>>>>> else on >>>>>>> the network. >>>>>>> >>>>>>> mpiexec can start groups on the other machines just fine. It seems >>>>>>> like there is some initialization that is done by orterun but not by >>>>>>> MPI_Comm_spawn. >>>>>>> >>>>>>> Is there a document that describes how the default process management >>>>>>> works? >>>>>>> I do not have infiniband, myrinet or any specialized rte, just ssh. >>>>>>> All the machines are CentOS 5.2 (openmpi 1.2.5) >>>>>>> >>>>>>> >>>>>>> -- Mark >>>>>>> >>>>>>> Ralph Castain wrote: >>>>>>>> >>>>>>>> The string "localhost" may not be recognized in the 1.2 series for >>>>>>>> comm_spawn. Do a "hostname" and use that string instead - should work. >>>>>>>> >>>>>>>> Ralph >>>>>>>> >>>>>>>> On Jul 28, 2008, at 10:38 AM, Mark Borgerding wrote: >>>>>>>> >>>>>>>>> When I add the info parameter in MPI_Comm_spawn, I get the error >>>>>>>>> "Some of the requested hosts are not included in the current >>>>>>>>> allocation for the application: >>>>>>>>> [...] >>>>>>>>> Verify that you have mapped the allocated resources properly using >>>>>>>>> the >>>>>>>>> --host specification." >>>>>>>>> >>>>>>>>> Here is a snippet of my code that causes the error: >>>>>>>>> >>>>>>>>> MPI_Info info; >>>>>>>>> MPI_Info_create( &info ); >>>>>>>>> MPI_Info_set(info,"host","localhost"); >>>>>>>>> MPI_Comm_spawn( cmd , MPI_ARGV_NULL , nkids , info , 0 , >>>>>>>>> MPI_COMM_SELF , &kid , errs ); >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Mark Borgerding wrote: >>>>>>>>>> >>>>>>>>>> Thanks, I don't know how I missed that. Perhaps I got thrown off >>>>>>>>>> by >>>>>>>>>> "Portable programs not requiring detailed control over process >>>>>>>>>> locations should use MPI_INFO_NULL." >>>>>>>>>> >>>>>>>>>> If there were a computing equivalent of Maslow's Hierarchy of >>>>>>>>>> Needs, functioning would be more fundamental than portability :) >>>>>>>>>> >>>>>>>>>> -- Mark >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Ralph Castain wrote: >>>>>>>>>>> >>>>>>>>>>> Take a look at the man page for MPI_Comm_spawn. It should explain >>>>>>>>>>> that you need to create an MPI_Info key that has the key of "host" >>>>>>>>>>> and a >>>>>>>>>>> value that contains a comma-delimited list of hosts to be used for >>>>>>>>>>> the child >>>>>>>>>>> processes. >>>>>>>>>>> >>>>>>>>>>> Hope that helps >>>>>>>>>>> Ralph >>>>>>>>>>> >>>>>>>>>>> On Jul 28, 2008, at 8:54 AM, Mark Borgerding wrote: >>>>>>>>>>> >>>>>>>>>>>> How does openmpi decide which hosts are used with >>>>>>>>>>>> MPI_Comm_spawn? All the docs I've found talk about specifying >>>>>>>>>>>> hosts on the >>>>>>>>>>>> mpiexec/mpirun command and so are not applicable. >>>>>>>>>>>> I am unable to spawn on anything but localhost (which makes for >>>>>>>>>>>> a pretty uninteresting cluster). >>>>>>>>>>>> >>>>>>>>>>>> When I run >>>>>>>>>>>> ompi_info --param rds hostfile >>>>>>>>>>>> It reports MCA rds: parameter >>>>>>>>>>>> "rds_hostfile_path" (current value: >>>>>>>>>>>> "/usr/lib/openmpi/1.2.5-gcc/etc/openmpi-default-hostfile") >>>>>>>>>>>> I tried changing that file but it has no effect. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> I am using >>>>>>>>>>>> openmpi 1.2.5 >>>>>>>>>>>> CentOS 5.2 >>>>>>>>>>>> ethernet TCP >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -- Mark >>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>> users mailing list >>>>>>>>>>>> us...@open-mpi.org >>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> users mailing list >>>>>>>>>>> us...@open-mpi.org >>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Mark Borgerding >>>>>>>>> 3dB Labs, Inc >>>>>>>>> Innovate. Develop. Deliver. >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> users mailing list >>>>>>>>> us...@open-mpi.org >>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> users mailing list >>>>>>>> us...@open-mpi.org >>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>> >>>>>>> _______________________________________________ >>>>>>> users mailing list >>>>>>> us...@open-mpi.org >>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>> >>>>>> _______________________________________________ >>>>>> users mailing list >>>>>> us...@open-mpi.org >>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>> >>>>> _______________________________________________ >>>>> users mailing list >>>>> us...@open-mpi.org >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > #include <stdio.h> > #include <stdlib.h> > #include <string.h> > #include <mpi.h> > > /* > *(new BSD license) > * > Copyright (c) 2008 Mark Borgerding > > All rights reserved. > > Redistribution and use in source and binary forms, with or without > modification, are permitted provided that the following conditions are met: > > * Redistributions of source code must retain the above copyright notice, > this list of conditions and the following disclaimer. > * Redistributions in binary form must reproduce the above copyright > notice, this list of conditions and the following disclaimer in the > documentation and/or other materials provided with the distribution. > * Neither the author nor the names of any contributors may be used to > endorse or promote products derived from this software without specific > prior written permission. > THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS > IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, > THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR > PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR > CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, > EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, > PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; > OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, > WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR > OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF > ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. > * > */ > > int main(int argc, char ** argv) > { > MPI_Comm parent; > MPI_Comm allmpi; > MPI_Info info; > MPI_Comm icom; > MPI_Status status; > int i,k,rank,size,length,count; > char name[256]; > > MPI_Init(NULL,NULL); > MPI_Comm_get_parent(&parent); > > if ( parent == MPI_COMM_NULL ) { > MPI_Comm_size(MPI_COMM_WORLD,&size); > if (size>1) { > fprintf(stderr,"I think I was started by orterun\n"); > MPI_Comm_dup(MPI_COMM_WORLD,&allmpi); > }else{ > if (argc<2) { > fprintf(stderr,"please provide a host argument (will be > placed in MPI_Info for MPI_Comm_spawn\n"); > return 1; > } > fprintf(stderr,"I am going to spawn 2 children on %s\n",argv[1]); > int errs[2]; > > MPI_Info_create( &info ); > MPI_Info_set(info,"host",argv[1]); > > MPI_Comm_spawn(argv[0],MPI_ARGV_NULL,2,info,0,MPI_COMM_WORLD,&icom,errs); > MPI_Intercomm_merge( icom, 0, &allmpi); > MPI_Info_free(&info); > } > }else{ > fprintf(stderr,"I was started by MPI_Comm_spawn\n"); > MPI_Intercomm_merge( parent, 1, &allmpi); > } > > MPI_Comm_rank(allmpi,&rank); > MPI_Comm_size(allmpi,&size); > MPI_Get_processor_name(name,&length); > fprintf(stderr,"Hello my name is %s. I am %d of %d\n",name,rank,size); > > if (rank==0) { > int k; > float buf[128]; > memset(buf,0,sizeof(buf)); > fprintf(stderr,"rank zero sending data to all others\n"); > for (k=1;k<size;++k) > MPI_Send( buf , 128 , MPI_FLOAT, k, 42 , allmpi); > fprintf(stderr,"rank zero data from all others\n"); > > for (k=1;k<size;++k) { > MPI_Recv( buf , 128 , MPI_FLOAT, k, 42 , allmpi,&status); > MPI_Get_count( &status, MPI_FLOAT, &count); > if (count!= 128) { > fprintf(stderr,"short read from %d (count=%d)\n",k,count); > exit(1); > } > } > }else{ > float buf[128]; > MPI_Recv( buf , 128 , MPI_FLOAT, 0, 42 , allmpi,&status); > MPI_Get_count( &status, MPI_FLOAT, &count); > if (count!= 128) { > fprintf(stderr,"short read from 0 (count=%d)\n",count); > exit(1); > } > MPI_Send( buf , 128 , MPI_FLOAT, 0, 42 , allmpi); > } > fprintf(stderr,"Exiting %s (%d of %d)\n",name,rank,size); > > MPI_Comm_free( &allmpi); > MPI_Finalize(); > } > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >