>echo hostsfile localhost budgeb-sandybridge Thanks, Brian
On Tue, Aug 28, 2012 at 2:36 PM, Ralph Castain <r...@open-mpi.org> wrote: > Hmmm...what is in your "hostsfile"? > > On Aug 28, 2012, at 2:33 PM, Brian Budge <brian.bu...@gmail.com> wrote: > >> Hi Ralph - >> >> Thanks for confirming this is possible. I'm trying this and currently >> failing. Perhaps there's something I'm missing in the code to make >> this work. Here are the two instantiations and their outputs: >> >>> LD_LIBRARY_PATH=/home/budgeb/p4/pseb/external/lib.dev:/usr/local/lib >>> OMPI_MCA_orte_default_hostfile=`pwd`/hostsfile ./master_exe >> cannot start slaves... not enough nodes >> >>> LD_LIBRARY_PATH=/home/budgeb/p4/pseb/external/lib.dev:/usr/local/lib >>> OMPI_MCA_orte_default_hostfile=`pwd`/hostsfile mpirun -n 1 ./master_exe >> master spawned 1 slaves... >> slave responding... >> >> >> The code: >> >> //master.cpp >> #include <mpi.h> >> #include <boost/filesystem.hpp> >> #include <iostream> >> >> int main(int argc, char **args) { >> int worldSize, universeSize, *puniverseSize, flag; >> >> MPI_Comm everyone; //intercomm >> boost::filesystem::path curPath = >> boost::filesystem::absolute(boost::filesystem::current_path()); >> >> std::string toRun = (curPath / "slave_exe").string(); >> >> int ret = MPI_Init(&argc, &args); >> >> if(ret != MPI_SUCCESS) { >> std::cerr << "failed init" << std::endl; >> return -1; >> } >> >> MPI_Comm_size(MPI_COMM_WORLD, &worldSize); >> >> if(worldSize != 1) { >> std::cerr << "too many masters" << std::endl; >> } >> >> MPI_Attr_get(MPI_COMM_WORLD, MPI_UNIVERSE_SIZE, &puniverseSize, &flag); >> >> if(!flag) { >> std::cerr << "no universe size" << std::endl; >> return -1; >> } >> universeSize = *puniverseSize; >> if(universeSize == 1) { >> std::cerr << "cannot start slaves... not enough nodes" << std::endl; >> } >> >> >> char *buf = (char*)alloca(toRun.size() + 1); >> memcpy(buf, toRun.c_str(), toRun.size()); >> buf[toRun.size()] = '\0'; >> >> MPI_Comm_spawn(buf, MPI_ARGV_NULL, universeSize-1, MPI_INFO_NULL, >> 0, MPI_COMM_SELF, &everyone, >> MPI_ERRCODES_IGNORE); >> >> std::cerr << "master spawned " << universeSize-1 << " slaves..." >> << std::endl; >> >> MPI_Finalize(); >> >> return 0; >> } >> >> >> //slave.cpp >> #include <mpi.h> >> >> int main(int argc, char **args) { >> int size; >> MPI_Comm parent; >> MPI_Init(&argc, &args); >> >> MPI_Comm_get_parent(&parent); >> >> if(parent == MPI_COMM_NULL) { >> std::cerr << "slave has no parent" << std::endl; >> } >> MPI_Comm_remote_size(parent, &size); >> if(size != 1) { >> std::cerr << "parent size is " << size << std::endl; >> } >> >> std::cerr << "slave responding..." << std::endl; >> >> MPI_Finalize(); >> >> return 0; >> } >> >> >> Any ideas? Thanks for any help. >> >> Brian >> >> On Wed, Aug 22, 2012 at 9:03 AM, Ralph Castain <r...@open-mpi.org> wrote: >>> It really is just that simple :-) >>> >>> On Aug 22, 2012, at 8:56 AM, Brian Budge <brian.bu...@gmail.com> wrote: >>> >>>> Okay. Is there a tutorial or FAQ for setting everything up? Or is it >>>> really just that simple? I don't need to run a copy of the orte >>>> server somewhere? >>>> >>>> if my current ip is 192.168.0.1, >>>> >>>> 0 > echo 192.168.0.11 > /tmp/hostfile >>>> 1 > echo 192.168.0.12 >> /tmp/hostfile >>>> 2 > export OMPI_MCA_orte_default_hostfile=/tmp/hostfile >>>> 3 > ./mySpawningExe >>>> >>>> At this point, mySpawningExe will be the master, running on >>>> 192.168.0.1, and I can have spawned, for example, childExe on >>>> 192.168.0.11 and 192.168.0.12? Or childExe1 on 192.168.0.11 and >>>> childExe2 on 192.168.0.12? >>>> >>>> Thanks for the help. >>>> >>>> Brian >>>> >>>> On Wed, Aug 22, 2012 at 7:15 AM, Ralph Castain <r...@open-mpi.org> wrote: >>>>> Sure, that's still true on all 1.3 or above releases. All you need to do >>>>> is set the hostfile envar so we pick it up: >>>>> >>>>> OMPI_MCA_orte_default_hostfile=<foo> >>>>> >>>>> >>>>> On Aug 21, 2012, at 7:23 PM, Brian Budge <brian.bu...@gmail.com> wrote: >>>>> >>>>>> Hi. I know this is an old thread, but I'm curious if there are any >>>>>> tutorials describing how to set this up? Is this still available on >>>>>> newer open mpi versions? >>>>>> >>>>>> Thanks, >>>>>> Brian >>>>>> >>>>>> On Fri, Jan 4, 2008 at 7:57 AM, Ralph Castain <r...@lanl.gov> wrote: >>>>>>> Hi Elena >>>>>>> >>>>>>> I'm copying this to the user list just to correct a mis-statement on my >>>>>>> part >>>>>>> in an earlier message that went there. I had stated that a singleton >>>>>>> could >>>>>>> comm_spawn onto other nodes listed in a hostfile by setting an >>>>>>> environmental >>>>>>> variable that pointed us to the hostfile. >>>>>>> >>>>>>> This is incorrect in the 1.2 code series. That series does not allow >>>>>>> singletons to read a hostfile at all. Hence, any comm_spawn done by a >>>>>>> singleton can only launch child processes on the singleton's local host. >>>>>>> >>>>>>> This situation has been corrected for the upcoming 1.3 code series. For >>>>>>> the >>>>>>> 1.2 series, though, you will have to do it via an mpirun command line. >>>>>>> >>>>>>> Sorry for the confusion - I sometimes have too many code families to >>>>>>> keep >>>>>>> straight in this old mind! >>>>>>> >>>>>>> Ralph >>>>>>> >>>>>>> >>>>>>> On 1/4/08 5:10 AM, "Elena Zhebel" <ezhe...@fugro-jason.com> wrote: >>>>>>> >>>>>>>> Hello Ralph, >>>>>>>> >>>>>>>> Thank you very much for the explanations. >>>>>>>> But I still do not get it running... >>>>>>>> >>>>>>>> For the case >>>>>>>> mpirun -n 1 -hostfile my_hostfile -host my_master_host my_master.exe >>>>>>>> everything works. >>>>>>>> >>>>>>>> For the case >>>>>>>> ./my_master.exe >>>>>>>> it does not. >>>>>>>> >>>>>>>> I did: >>>>>>>> - create my_hostfile and put it in the $HOME/.openmpi/components/ >>>>>>>> my_hostfile : >>>>>>>> bollenstreek slots=2 max_slots=3 >>>>>>>> octocore01 slots=8 max_slots=8 >>>>>>>> octocore02 slots=8 max_slots=8 >>>>>>>> clstr000 slots=2 max_slots=3 >>>>>>>> clstr001 slots=2 max_slots=3 >>>>>>>> clstr002 slots=2 max_slots=3 >>>>>>>> clstr003 slots=2 max_slots=3 >>>>>>>> clstr004 slots=2 max_slots=3 >>>>>>>> clstr005 slots=2 max_slots=3 >>>>>>>> clstr006 slots=2 max_slots=3 >>>>>>>> clstr007 slots=2 max_slots=3 >>>>>>>> - setenv OMPI_MCA_rds_hostfile_path my_hostfile (I put it in .tcshrc >>>>>>>> and >>>>>>>> then source .tcshrc) >>>>>>>> - in my_master.cpp I did >>>>>>>> MPI_Info info1; >>>>>>>> MPI_Info_create(&info1); >>>>>>>> char* hostname = >>>>>>>> "clstr002,clstr003,clstr005,clstr006,clstr007,octocore01,octocore02"; >>>>>>>> MPI_Info_set(info1, "host", hostname); >>>>>>>> >>>>>>>> _intercomm = intracomm.Spawn("./childexe", argv1, _nProc, info1, 0, >>>>>>>> MPI_ERRCODES_IGNORE); >>>>>>>> >>>>>>>> - After I call the executable, I've got this error message >>>>>>>> >>>>>>>> bollenstreek: > ./my_master >>>>>>>> number of processes to run: 1 >>>>>>>> -------------------------------------------------------------------------- >>>>>>>> Some of the requested hosts are not included in the current allocation >>>>>>>> for >>>>>>>> the application: >>>>>>>> ./childexe >>>>>>>> The requested hosts were: >>>>>>>> clstr002,clstr003,clstr005,clstr006,clstr007,octocore01,octocore02 >>>>>>>> >>>>>>>> Verify that you have mapped the allocated resources properly using the >>>>>>>> --host specification. >>>>>>>> -------------------------------------------------------------------------- >>>>>>>> [bollenstreek:21443] [0,0,0] ORTE_ERROR_LOG: Out of resource in file >>>>>>>> base/rmaps_base_support_fns.c at line 225 >>>>>>>> [bollenstreek:21443] [0,0,0] ORTE_ERROR_LOG: Out of resource in file >>>>>>>> rmaps_rr.c at line 478 >>>>>>>> [bollenstreek:21443] [0,0,0] ORTE_ERROR_LOG: Out of resource in file >>>>>>>> base/rmaps_base_map_job.c at line 210 >>>>>>>> [bollenstreek:21443] [0,0,0] ORTE_ERROR_LOG: Out of resource in file >>>>>>>> rmgr_urm.c at line 372 >>>>>>>> [bollenstreek:21443] [0,0,0] ORTE_ERROR_LOG: Out of resource in file >>>>>>>> communicator/comm_dyn.c at line 608 >>>>>>>> >>>>>>>> Did I miss something? >>>>>>>> Thanks for help! >>>>>>>> >>>>>>>> Elena >>>>>>>> >>>>>>>> >>>>>>>> -----Original Message----- >>>>>>>> From: Ralph H Castain [mailto:r...@lanl.gov] >>>>>>>> Sent: Tuesday, December 18, 2007 3:50 PM >>>>>>>> To: Elena Zhebel; Open MPI Users <us...@open-mpi.org> >>>>>>>> Cc: Ralph H Castain >>>>>>>> Subject: Re: [OMPI users] MPI::Intracomm::Spawn and cluster >>>>>>>> configuration >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On 12/18/07 7:35 AM, "Elena Zhebel" <ezhe...@fugro-jason.com> wrote: >>>>>>>> >>>>>>>>> Thanks a lot! Now it works! >>>>>>>>> The solution is to use mpirun -n 1 -hostfile my.hosts *.exe and pass >>>>>>>> MPI_Info >>>>>>>>> Key to the Spawn function! >>>>>>>>> >>>>>>>>> One more question: is it necessary to start my "master" program with >>>>>>>>> mpirun -n 1 -hostfile my_hostfile -host my_master_host my_master.exe ? >>>>>>>> >>>>>>>> No, it isn't necessary - assuming that my_master_host is the first host >>>>>>>> listed in your hostfile! If you are only executing one my_master.exe >>>>>>>> (i.e., >>>>>>>> you gave -n 1 to mpirun), then we will automatically map that process >>>>>>>> onto >>>>>>>> the first host in your hostfile. >>>>>>>> >>>>>>>> If you want my_master.exe to go on someone other than the first host >>>>>>>> in the >>>>>>>> file, then you have to give us the -host option. >>>>>>>> >>>>>>>>> >>>>>>>>> Are there other possibilities for easy start? >>>>>>>>> I would say just to run ./my_master.exe , but then the master process >>>>>>>> doesn't >>>>>>>>> know about the available in the network hosts. >>>>>>>> >>>>>>>> You can set the hostfile parameter in your environment instead of on >>>>>>>> the >>>>>>>> command line. Just set OMPI_MCA_rds_hostfile_path = my.hosts. >>>>>>>> >>>>>>>> You can then just run ./my_master.exe on the host where you want the >>>>>>>> master >>>>>>>> to reside - everything should work the same. >>>>>>>> >>>>>>>> Just as an FYI: the name of that environmental variable is going to >>>>>>>> change >>>>>>>> in the 1.3 release, but everything will still work the same. >>>>>>>> >>>>>>>> Hope that helps >>>>>>>> Ralph >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> Thanks and regards, >>>>>>>>> Elena >>>>>>>>> >>>>>>>>> >>>>>>>>> -----Original Message----- >>>>>>>>> From: Ralph H Castain [mailto:r...@lanl.gov] >>>>>>>>> Sent: Monday, December 17, 2007 5:49 PM >>>>>>>>> To: Open MPI Users <us...@open-mpi.org>; Elena Zhebel >>>>>>>>> Cc: Ralph H Castain >>>>>>>>> Subject: Re: [OMPI users] MPI::Intracomm::Spawn and cluster >>>>>>>>> configuration >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On 12/17/07 8:19 AM, "Elena Zhebel" <ezhe...@fugro-jason.com> wrote: >>>>>>>>> >>>>>>>>>> Hello Ralph, >>>>>>>>>> >>>>>>>>>> Thank you for your answer. >>>>>>>>>> >>>>>>>>>> I'm using OpenMPI 1.2.3. , compiler glibc232, Linux Suse 10.0. >>>>>>>>>> My "master" executable runs only on the one local host, then it >>>>>>>>>> spawns >>>>>>>>>> "slaves" (with MPI::Intracomm::Spawn). >>>>>>>>>> My question was: how to determine the hosts where these "slaves" >>>>>>>>>> will be >>>>>>>>>> spawned? >>>>>>>>>> You said: "You have to specify all of the hosts that can be used by >>>>>>>>>> your job >>>>>>>>>> in the original hostfile". How can I specify the host file? I can not >>>>>>>>>> find it >>>>>>>>>> in the documentation. >>>>>>>>> >>>>>>>>> Hmmm...sorry about the lack of documentation. I always assumed that >>>>>>>>> the MPI >>>>>>>>> folks in the project would document such things since it has little >>>>>>>>> to do >>>>>>>>> with the underlying run-time, but I guess that fell through the >>>>>>>>> cracks. >>>>>>>>> >>>>>>>>> There are two parts to your question: >>>>>>>>> >>>>>>>>> 1. how to specify the hosts to be used for the entire job. I believe >>>>>>>>> that >>>>>>>> is >>>>>>>>> somewhat covered here: >>>>>>>>> http://www.open-mpi.org/faq/?category=running#simple-spmd-run >>>>>>>>> >>>>>>>>> That FAQ tells you what a hostfile should look like, though you may >>>>>>>>> already >>>>>>>>> know that. Basically, we require that you list -all- of the nodes >>>>>>>>> that both >>>>>>>>> your master and slave programs will use. >>>>>>>>> >>>>>>>>> 2. how to specify which nodes are available for the master, and which >>>>>>>>> for >>>>>>>>> the slave. >>>>>>>>> >>>>>>>>> You would specify the host for your master on the mpirun command line >>>>>>>>> with >>>>>>>>> something like: >>>>>>>>> >>>>>>>>> mpirun -n 1 -hostfile my_hostfile -host my_master_host my_master.exe >>>>>>>>> >>>>>>>>> This directs Open MPI to map that specified executable on the >>>>>>>>> specified >>>>>>>> host >>>>>>>>> - note that my_master_host must have been in my_hostfile. >>>>>>>>> >>>>>>>>> Inside your master, you would create an MPI_Info key "host" that has a >>>>>>>> value >>>>>>>>> consisting of a string "host1,host2,host3" identifying the hosts you >>>>>>>>> want >>>>>>>>> your slave to execute upon. Those hosts must have been included in >>>>>>>>> my_hostfile. Include that key in the MPI_Info array passed to your >>>>>>>>> Spawn. >>>>>>>>> >>>>>>>>> We don't currently support providing a hostfile for the slaves (as >>>>>>>>> opposed >>>>>>>>> to the host-at-a-time string above). This may become available in a >>>>>>>>> future >>>>>>>>> release - TBD. >>>>>>>>> >>>>>>>>> Hope that helps >>>>>>>>> Ralph >>>>>>>>> >>>>>>>>>> >>>>>>>>>> Thanks and regards, >>>>>>>>>> Elena >>>>>>>>>> >>>>>>>>>> -----Original Message----- >>>>>>>>>> From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] >>>>>>>>>> On >>>>>>>>>> Behalf Of Ralph H Castain >>>>>>>>>> Sent: Monday, December 17, 2007 3:31 PM >>>>>>>>>> To: Open MPI Users <us...@open-mpi.org> >>>>>>>>>> Cc: Ralph H Castain >>>>>>>>>> Subject: Re: [OMPI users] MPI::Intracomm::Spawn and cluster >>>>>>>>>> configuration >>>>>>>>>> >>>>>>>>>> On 12/12/07 5:46 AM, "Elena Zhebel" <ezhe...@fugro-jason.com> wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Hello, >>>>>>>>>>> >>>>>>>>>>> I'm working on a MPI application where I'm using OpenMPI instead of >>>>>>>>>>> MPICH. >>>>>>>>>>> >>>>>>>>>>> In my "master" program I call the function MPI::Intracomm::Spawn >>>>>>>>>>> which >>>>>>>>>> spawns >>>>>>>>>>> "slave" processes. It is not clear for me how to spawn the "slave" >>>>>>>>>> processes >>>>>>>>>>> over the network. Currently "master" creates "slaves" on the same >>>>>>>>>>> host. >>>>>>>>>>> >>>>>>>>>>> If I use 'mpirun --hostfile openmpi.hosts' then processes are spawn >>>>>>>>>>> over >>>>>>>>>> the >>>>>>>>>>> network as expected. But now I need to spawn processes over the >>>>>>>>>>> network >>>>>>>>>> from >>>>>>>>>>> my own executable using MPI::Intracomm::Spawn, how can I achieve it? >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> I'm not sure from your description exactly what you are trying to do, >>>>>>>>>> nor in >>>>>>>>>> what environment this is all operating within or what version of Open >>>>>>>>>> MPI >>>>>>>>>> you are using. Setting aside the environment and version issue, I'm >>>>>>>>>> guessing >>>>>>>>>> that you are running your executable over some specified set of >>>>>>>>>> hosts, >>>>>>>>>> but >>>>>>>>>> want to provide a different hostfile that specifies the hosts to be >>>>>>>>>> used for >>>>>>>>>> the "slave" processes. Correct? >>>>>>>>>> >>>>>>>>>> If that is correct, then I'm afraid you can't do that in any version >>>>>>>>>> of Open >>>>>>>>>> MPI today. You have to specify all of the hosts that can be used by >>>>>>>>>> your job >>>>>>>>>> in the original hostfile. You can then specify a subset of those >>>>>>>>>> hosts >>>>>>>>>> to be >>>>>>>>>> used by your original "master" program, and then specify a different >>>>>>>>>> subset >>>>>>>>>> to be used by the "slaves" when calling Spawn. >>>>>>>>>> >>>>>>>>>> But the system requires that you tell it -all- of the hosts that are >>>>>>>>>> going >>>>>>>>>> to be used at the beginning of the job. >>>>>>>>>> >>>>>>>>>> At the moment, there is no plan to remove that requirement, though >>>>>>>>>> there has >>>>>>>>>> been occasional discussion about doing so at some point in the >>>>>>>>>> future. >>>>>>>>>> No >>>>>>>>>> promises that it will happen, though - managed environments, in >>>>>>>>>> particular, >>>>>>>>>> currently object to the idea of changing the allocation on-the-fly. >>>>>>>>>> We >>>>>>>>>> may, >>>>>>>>>> though, make a provision for purely hostfile-based environments >>>>>>>>>> (i.e., >>>>>>>>>> unmanaged) at some time in the future. >>>>>>>>>> >>>>>>>>>> Ralph >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Thanks in advance for any help. >>>>>>>>>>> >>>>>>>>>>> Elena >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> users mailing list >>>>>>>>>>> us...@open-mpi.org >>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> users mailing list >>>>>>>>>> us...@open-mpi.org >>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> users mailing list >>>>>>>>>> us...@open-mpi.org >>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> users mailing list >>>>>>> us...@open-mpi.org >>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>> _______________________________________________ >>>>>> users mailing list >>>>>> us...@open-mpi.org >>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>> >>>>> >>>>> _______________________________________________ >>>>> users mailing list >>>>> us...@open-mpi.org >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users