not off the top of my head. However, as noted earlier, there is absolutely no advantage to a singleton vs mpirun start - all the singleton does is immediately fork/exec "mpirun" to support the rest of the job. In both cases, you have a daemon running the job - only difference is in the number of characters the user types to start it.
On Aug 30, 2012, at 8:44 AM, Brian Budge <brian.bu...@gmail.com> wrote: > In the event that I need to get this up-and-running soon (I do need > something working within 2 weeks), can you recommend an older version > where this is expected to work? > > Thanks, > Brian > > On Tue, Aug 28, 2012 at 4:58 PM, Brian Budge <brian.bu...@gmail.com> wrote: >> Thanks! >> >> On Tue, Aug 28, 2012 at 4:57 PM, Ralph Castain <r...@open-mpi.org> wrote: >>> Yeah, I'm seeing the hang as well when running across multiple machines. >>> Let me dig a little and get this fixed. >>> >>> Thanks >>> Ralph >>> >>> On Aug 28, 2012, at 4:51 PM, Brian Budge <brian.bu...@gmail.com> wrote: >>> >>>> Hmmm, I went to the build directories of openmpi for my two machines, >>>> went into the orte/test/mpi directory and made the executables on both >>>> machines. I set the hostsfile in the env variable on the "master" >>>> machine. >>>> >>>> Here's the output: >>>> >>>> OMPI_MCA_orte_default_hostfile=/home/budgeb/p4/pseb/external/install/openmpi-1.6.1/orte/test/mpi/hostsfile >>>> ./simple_spawn >>>> Parent [pid 97504] starting up! >>>> 0 completed MPI_Init >>>> Parent [pid 97504] about to spawn! >>>> Parent [pid 97507] starting up! >>>> Parent [pid 97508] starting up! >>>> Parent [pid 30626] starting up! >>>> ^C >>>> zsh: interrupt OMPI_MCA_orte_default_hostfile= ./simple_spawn >>>> >>>> I had to ^C to kill the hung process. >>>> >>>> When I run using mpirun: >>>> >>>> OMPI_MCA_orte_default_hostfile=/home/budgeb/p4/pseb/external/install/openmpi-1.6.1/orte/test/mpi/hostsfile >>>> mpirun -np 1 ./simple_spawn >>>> Parent [pid 97511] starting up! >>>> 0 completed MPI_Init >>>> Parent [pid 97511] about to spawn! >>>> Parent [pid 97513] starting up! >>>> Parent [pid 30762] starting up! >>>> Parent [pid 30764] starting up! >>>> Parent done with spawn >>>> Parent sending message to child >>>> 1 completed MPI_Init >>>> Hello from the child 1 of 3 on host budgeb-sandybridge pid 97513 >>>> 0 completed MPI_Init >>>> Hello from the child 0 of 3 on host budgeb-interlagos pid 30762 >>>> 2 completed MPI_Init >>>> Hello from the child 2 of 3 on host budgeb-interlagos pid 30764 >>>> Child 1 disconnected >>>> Child 0 received msg: 38 >>>> Child 0 disconnected >>>> Parent disconnected >>>> Child 2 disconnected >>>> 97511: exiting >>>> 97513: exiting >>>> 30762: exiting >>>> 30764: exiting >>>> >>>> As you can see, I'm using openmpi v 1.6.1. I just barely freshly >>>> installed on both machines using the default configure options. >>>> >>>> Thanks for all your help. >>>> >>>> Brian >>>> >>>> On Tue, Aug 28, 2012 at 4:39 PM, Ralph Castain <r...@open-mpi.org> wrote: >>>>> Looks to me like it didn't find your executable - could be a question of >>>>> where it exists relative to where you are running. If you look in your >>>>> OMPI source tree at the orte/test/mpi directory, you'll see an example >>>>> program "simple_spawn.c" there. Just "make simple_spawn" and execute that >>>>> with your default hostfile set - does it work okay? >>>>> >>>>> It works fine for me, hence the question. >>>>> >>>>> Also, what OMPI version are you using? >>>>> >>>>> On Aug 28, 2012, at 4:25 PM, Brian Budge <brian.bu...@gmail.com> wrote: >>>>> >>>>>> I see. Okay. So, I just tried removing the check for universe size, >>>>>> and set the universe size to 2. Here's my output: >>>>>> >>>>>> LD_LIBRARY_PATH=/home/budgeb/p4/pseb/external/lib.dev:/usr/local/lib >>>>>> OMPI_MCA_orte_default_hostfile=`pwd`/hostsfile ./master_exe >>>>>> [budgeb-interlagos:29965] [[4156,0],0] ORTE_ERROR_LOG: Fatal in file >>>>>> base/plm_base_receive.c at line 253 >>>>>> [budgeb-interlagos:29963] [[4156,1],0] ORTE_ERROR_LOG: The specified >>>>>> application failed to start in file dpm_orte.c at line 785 >>>>>> >>>>>> The corresponding run with mpirun still works. >>>>>> >>>>>> Thanks, >>>>>> Brian >>>>>> >>>>>> On Tue, Aug 28, 2012 at 2:46 PM, Ralph Castain <r...@open-mpi.org> wrote: >>>>>>> I see the issue - it's here: >>>>>>> >>>>>>>> MPI_Attr_get(MPI_COMM_WORLD, MPI_UNIVERSE_SIZE, &puniverseSize, &flag); >>>>>>>> >>>>>>>> if(!flag) { >>>>>>>> std::cerr << "no universe size" << std::endl; >>>>>>>> return -1; >>>>>>>> } >>>>>>>> universeSize = *puniverseSize; >>>>>>>> if(universeSize == 1) { >>>>>>>> std::cerr << "cannot start slaves... not enough nodes" << std::endl; >>>>>>>> } >>>>>>> >>>>>>> The universe size is set to 1 on a singleton because the attribute gets >>>>>>> set at the beginning of time - we haven't any way to go back and change >>>>>>> it. The sequence of events explains why. The singleton starts up and >>>>>>> sets its attributes, including universe_size. It also spins off an orte >>>>>>> daemon to act as its own private "mpirun" in case you call comm_spawn. >>>>>>> At this point, however, no hostfile has been read - the singleton is >>>>>>> just an MPI proc doing its own thing, and the orte daemon is just >>>>>>> sitting there on "stand-by". >>>>>>> >>>>>>> When your app calls comm_spawn, then the orte daemon gets called to >>>>>>> launch the new procs. At that time, it (not the original singleton!) >>>>>>> reads the hostfile to find out how many nodes are around, and then does >>>>>>> the launch. >>>>>>> >>>>>>> You are trying to check the number of nodes from within the singleton, >>>>>>> which won't work - it has no way of discovering that info. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Aug 28, 2012, at 2:38 PM, Brian Budge <brian.bu...@gmail.com> wrote: >>>>>>> >>>>>>>>> echo hostsfile >>>>>>>> localhost >>>>>>>> budgeb-sandybridge >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Brian >>>>>>>> >>>>>>>> On Tue, Aug 28, 2012 at 2:36 PM, Ralph Castain <r...@open-mpi.org> >>>>>>>> wrote: >>>>>>>>> Hmmm...what is in your "hostsfile"? >>>>>>>>> >>>>>>>>> On Aug 28, 2012, at 2:33 PM, Brian Budge <brian.bu...@gmail.com> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Hi Ralph - >>>>>>>>>> >>>>>>>>>> Thanks for confirming this is possible. I'm trying this and >>>>>>>>>> currently >>>>>>>>>> failing. Perhaps there's something I'm missing in the code to make >>>>>>>>>> this work. Here are the two instantiations and their outputs: >>>>>>>>>> >>>>>>>>>>> LD_LIBRARY_PATH=/home/budgeb/p4/pseb/external/lib.dev:/usr/local/lib >>>>>>>>>>> OMPI_MCA_orte_default_hostfile=`pwd`/hostsfile ./master_exe >>>>>>>>>> cannot start slaves... not enough nodes >>>>>>>>>> >>>>>>>>>>> LD_LIBRARY_PATH=/home/budgeb/p4/pseb/external/lib.dev:/usr/local/lib >>>>>>>>>>> OMPI_MCA_orte_default_hostfile=`pwd`/hostsfile mpirun -n 1 >>>>>>>>>>> ./master_exe >>>>>>>>>> master spawned 1 slaves... >>>>>>>>>> slave responding... >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> The code: >>>>>>>>>> >>>>>>>>>> //master.cpp >>>>>>>>>> #include <mpi.h> >>>>>>>>>> #include <boost/filesystem.hpp> >>>>>>>>>> #include <iostream> >>>>>>>>>> >>>>>>>>>> int main(int argc, char **args) { >>>>>>>>>> int worldSize, universeSize, *puniverseSize, flag; >>>>>>>>>> >>>>>>>>>> MPI_Comm everyone; //intercomm >>>>>>>>>> boost::filesystem::path curPath = >>>>>>>>>> boost::filesystem::absolute(boost::filesystem::current_path()); >>>>>>>>>> >>>>>>>>>> std::string toRun = (curPath / "slave_exe").string(); >>>>>>>>>> >>>>>>>>>> int ret = MPI_Init(&argc, &args); >>>>>>>>>> >>>>>>>>>> if(ret != MPI_SUCCESS) { >>>>>>>>>> std::cerr << "failed init" << std::endl; >>>>>>>>>> return -1; >>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> MPI_Comm_size(MPI_COMM_WORLD, &worldSize); >>>>>>>>>> >>>>>>>>>> if(worldSize != 1) { >>>>>>>>>> std::cerr << "too many masters" << std::endl; >>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> MPI_Attr_get(MPI_COMM_WORLD, MPI_UNIVERSE_SIZE, &puniverseSize, >>>>>>>>>> &flag); >>>>>>>>>> >>>>>>>>>> if(!flag) { >>>>>>>>>> std::cerr << "no universe size" << std::endl; >>>>>>>>>> return -1; >>>>>>>>>> } >>>>>>>>>> universeSize = *puniverseSize; >>>>>>>>>> if(universeSize == 1) { >>>>>>>>>> std::cerr << "cannot start slaves... not enough nodes" << >>>>>>>>>> std::endl; >>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> char *buf = (char*)alloca(toRun.size() + 1); >>>>>>>>>> memcpy(buf, toRun.c_str(), toRun.size()); >>>>>>>>>> buf[toRun.size()] = '\0'; >>>>>>>>>> >>>>>>>>>> MPI_Comm_spawn(buf, MPI_ARGV_NULL, universeSize-1, MPI_INFO_NULL, >>>>>>>>>> 0, MPI_COMM_SELF, &everyone, >>>>>>>>>> MPI_ERRCODES_IGNORE); >>>>>>>>>> >>>>>>>>>> std::cerr << "master spawned " << universeSize-1 << " slaves..." >>>>>>>>>> << std::endl; >>>>>>>>>> >>>>>>>>>> MPI_Finalize(); >>>>>>>>>> >>>>>>>>>> return 0; >>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> //slave.cpp >>>>>>>>>> #include <mpi.h> >>>>>>>>>> >>>>>>>>>> int main(int argc, char **args) { >>>>>>>>>> int size; >>>>>>>>>> MPI_Comm parent; >>>>>>>>>> MPI_Init(&argc, &args); >>>>>>>>>> >>>>>>>>>> MPI_Comm_get_parent(&parent); >>>>>>>>>> >>>>>>>>>> if(parent == MPI_COMM_NULL) { >>>>>>>>>> std::cerr << "slave has no parent" << std::endl; >>>>>>>>>> } >>>>>>>>>> MPI_Comm_remote_size(parent, &size); >>>>>>>>>> if(size != 1) { >>>>>>>>>> std::cerr << "parent size is " << size << std::endl; >>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> std::cerr << "slave responding..." << std::endl; >>>>>>>>>> >>>>>>>>>> MPI_Finalize(); >>>>>>>>>> >>>>>>>>>> return 0; >>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Any ideas? Thanks for any help. >>>>>>>>>> >>>>>>>>>> Brian >>>>>>>>>> >>>>>>>>>> On Wed, Aug 22, 2012 at 9:03 AM, Ralph Castain <r...@open-mpi.org> >>>>>>>>>> wrote: >>>>>>>>>>> It really is just that simple :-) >>>>>>>>>>> >>>>>>>>>>> On Aug 22, 2012, at 8:56 AM, Brian Budge <brian.bu...@gmail.com> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> Okay. Is there a tutorial or FAQ for setting everything up? Or >>>>>>>>>>>> is it >>>>>>>>>>>> really just that simple? I don't need to run a copy of the orte >>>>>>>>>>>> server somewhere? >>>>>>>>>>>> >>>>>>>>>>>> if my current ip is 192.168.0.1, >>>>>>>>>>>> >>>>>>>>>>>> 0 > echo 192.168.0.11 > /tmp/hostfile >>>>>>>>>>>> 1 > echo 192.168.0.12 >> /tmp/hostfile >>>>>>>>>>>> 2 > export OMPI_MCA_orte_default_hostfile=/tmp/hostfile >>>>>>>>>>>> 3 > ./mySpawningExe >>>>>>>>>>>> >>>>>>>>>>>> At this point, mySpawningExe will be the master, running on >>>>>>>>>>>> 192.168.0.1, and I can have spawned, for example, childExe on >>>>>>>>>>>> 192.168.0.11 and 192.168.0.12? Or childExe1 on 192.168.0.11 and >>>>>>>>>>>> childExe2 on 192.168.0.12? >>>>>>>>>>>> >>>>>>>>>>>> Thanks for the help. >>>>>>>>>>>> >>>>>>>>>>>> Brian >>>>>>>>>>>> >>>>>>>>>>>> On Wed, Aug 22, 2012 at 7:15 AM, Ralph Castain <r...@open-mpi.org> >>>>>>>>>>>> wrote: >>>>>>>>>>>>> Sure, that's still true on all 1.3 or above releases. All you >>>>>>>>>>>>> need to do is set the hostfile envar so we pick it up: >>>>>>>>>>>>> >>>>>>>>>>>>> OMPI_MCA_orte_default_hostfile=<foo> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Aug 21, 2012, at 7:23 PM, Brian Budge <brian.bu...@gmail.com> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Hi. I know this is an old thread, but I'm curious if there are >>>>>>>>>>>>>> any >>>>>>>>>>>>>> tutorials describing how to set this up? Is this still >>>>>>>>>>>>>> available on >>>>>>>>>>>>>> newer open mpi versions? >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>> Brian >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Fri, Jan 4, 2008 at 7:57 AM, Ralph Castain <r...@lanl.gov> >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>> Hi Elena >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I'm copying this to the user list just to correct a >>>>>>>>>>>>>>> mis-statement on my part >>>>>>>>>>>>>>> in an earlier message that went there. I had stated that a >>>>>>>>>>>>>>> singleton could >>>>>>>>>>>>>>> comm_spawn onto other nodes listed in a hostfile by setting an >>>>>>>>>>>>>>> environmental >>>>>>>>>>>>>>> variable that pointed us to the hostfile. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> This is incorrect in the 1.2 code series. That series does not >>>>>>>>>>>>>>> allow >>>>>>>>>>>>>>> singletons to read a hostfile at all. Hence, any comm_spawn >>>>>>>>>>>>>>> done by a >>>>>>>>>>>>>>> singleton can only launch child processes on the singleton's >>>>>>>>>>>>>>> local host. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> This situation has been corrected for the upcoming 1.3 code >>>>>>>>>>>>>>> series. For the >>>>>>>>>>>>>>> 1.2 series, though, you will have to do it via an mpirun >>>>>>>>>>>>>>> command line. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Sorry for the confusion - I sometimes have too many code >>>>>>>>>>>>>>> families to keep >>>>>>>>>>>>>>> straight in this old mind! >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Ralph >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On 1/4/08 5:10 AM, "Elena Zhebel" <ezhe...@fugro-jason.com> >>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hello Ralph, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thank you very much for the explanations. >>>>>>>>>>>>>>>> But I still do not get it running... >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> For the case >>>>>>>>>>>>>>>> mpirun -n 1 -hostfile my_hostfile -host my_master_host >>>>>>>>>>>>>>>> my_master.exe >>>>>>>>>>>>>>>> everything works. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> For the case >>>>>>>>>>>>>>>> ./my_master.exe >>>>>>>>>>>>>>>> it does not. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I did: >>>>>>>>>>>>>>>> - create my_hostfile and put it in the >>>>>>>>>>>>>>>> $HOME/.openmpi/components/ >>>>>>>>>>>>>>>> my_hostfile : >>>>>>>>>>>>>>>> bollenstreek slots=2 max_slots=3 >>>>>>>>>>>>>>>> octocore01 slots=8 max_slots=8 >>>>>>>>>>>>>>>> octocore02 slots=8 max_slots=8 >>>>>>>>>>>>>>>> clstr000 slots=2 max_slots=3 >>>>>>>>>>>>>>>> clstr001 slots=2 max_slots=3 >>>>>>>>>>>>>>>> clstr002 slots=2 max_slots=3 >>>>>>>>>>>>>>>> clstr003 slots=2 max_slots=3 >>>>>>>>>>>>>>>> clstr004 slots=2 max_slots=3 >>>>>>>>>>>>>>>> clstr005 slots=2 max_slots=3 >>>>>>>>>>>>>>>> clstr006 slots=2 max_slots=3 >>>>>>>>>>>>>>>> clstr007 slots=2 max_slots=3 >>>>>>>>>>>>>>>> - setenv OMPI_MCA_rds_hostfile_path my_hostfile (I put it in >>>>>>>>>>>>>>>> .tcshrc and >>>>>>>>>>>>>>>> then source .tcshrc) >>>>>>>>>>>>>>>> - in my_master.cpp I did >>>>>>>>>>>>>>>> MPI_Info info1; >>>>>>>>>>>>>>>> MPI_Info_create(&info1); >>>>>>>>>>>>>>>> char* hostname = >>>>>>>>>>>>>>>> "clstr002,clstr003,clstr005,clstr006,clstr007,octocore01,octocore02"; >>>>>>>>>>>>>>>> MPI_Info_set(info1, "host", hostname); >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> _intercomm = intracomm.Spawn("./childexe", argv1, _nProc, >>>>>>>>>>>>>>>> info1, 0, >>>>>>>>>>>>>>>> MPI_ERRCODES_IGNORE); >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> - After I call the executable, I've got this error message >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> bollenstreek: > ./my_master >>>>>>>>>>>>>>>> number of processes to run: 1 >>>>>>>>>>>>>>>> -------------------------------------------------------------------------- >>>>>>>>>>>>>>>> Some of the requested hosts are not included in the current >>>>>>>>>>>>>>>> allocation for >>>>>>>>>>>>>>>> the application: >>>>>>>>>>>>>>>> ./childexe >>>>>>>>>>>>>>>> The requested hosts were: >>>>>>>>>>>>>>>> clstr002,clstr003,clstr005,clstr006,clstr007,octocore01,octocore02 >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Verify that you have mapped the allocated resources properly >>>>>>>>>>>>>>>> using the >>>>>>>>>>>>>>>> --host specification. >>>>>>>>>>>>>>>> -------------------------------------------------------------------------- >>>>>>>>>>>>>>>> [bollenstreek:21443] [0,0,0] ORTE_ERROR_LOG: Out of resource >>>>>>>>>>>>>>>> in file >>>>>>>>>>>>>>>> base/rmaps_base_support_fns.c at line 225 >>>>>>>>>>>>>>>> [bollenstreek:21443] [0,0,0] ORTE_ERROR_LOG: Out of resource >>>>>>>>>>>>>>>> in file >>>>>>>>>>>>>>>> rmaps_rr.c at line 478 >>>>>>>>>>>>>>>> [bollenstreek:21443] [0,0,0] ORTE_ERROR_LOG: Out of resource >>>>>>>>>>>>>>>> in file >>>>>>>>>>>>>>>> base/rmaps_base_map_job.c at line 210 >>>>>>>>>>>>>>>> [bollenstreek:21443] [0,0,0] ORTE_ERROR_LOG: Out of resource >>>>>>>>>>>>>>>> in file >>>>>>>>>>>>>>>> rmgr_urm.c at line 372 >>>>>>>>>>>>>>>> [bollenstreek:21443] [0,0,0] ORTE_ERROR_LOG: Out of resource >>>>>>>>>>>>>>>> in file >>>>>>>>>>>>>>>> communicator/comm_dyn.c at line 608 >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Did I miss something? >>>>>>>>>>>>>>>> Thanks for help! >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Elena >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> -----Original Message----- >>>>>>>>>>>>>>>> From: Ralph H Castain [mailto:r...@lanl.gov] >>>>>>>>>>>>>>>> Sent: Tuesday, December 18, 2007 3:50 PM >>>>>>>>>>>>>>>> To: Elena Zhebel; Open MPI Users <us...@open-mpi.org> >>>>>>>>>>>>>>>> Cc: Ralph H Castain >>>>>>>>>>>>>>>> Subject: Re: [OMPI users] MPI::Intracomm::Spawn and cluster >>>>>>>>>>>>>>>> configuration >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On 12/18/07 7:35 AM, "Elena Zhebel" <ezhe...@fugro-jason.com> >>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thanks a lot! Now it works! >>>>>>>>>>>>>>>>> The solution is to use mpirun -n 1 -hostfile my.hosts *.exe >>>>>>>>>>>>>>>>> and pass >>>>>>>>>>>>>>>> MPI_Info >>>>>>>>>>>>>>>>> Key to the Spawn function! >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> One more question: is it necessary to start my "master" >>>>>>>>>>>>>>>>> program with >>>>>>>>>>>>>>>>> mpirun -n 1 -hostfile my_hostfile -host my_master_host >>>>>>>>>>>>>>>>> my_master.exe ? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> No, it isn't necessary - assuming that my_master_host is the >>>>>>>>>>>>>>>> first host >>>>>>>>>>>>>>>> listed in your hostfile! If you are only executing one >>>>>>>>>>>>>>>> my_master.exe (i.e., >>>>>>>>>>>>>>>> you gave -n 1 to mpirun), then we will automatically map that >>>>>>>>>>>>>>>> process onto >>>>>>>>>>>>>>>> the first host in your hostfile. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> If you want my_master.exe to go on someone other than the >>>>>>>>>>>>>>>> first host in the >>>>>>>>>>>>>>>> file, then you have to give us the -host option. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Are there other possibilities for easy start? >>>>>>>>>>>>>>>>> I would say just to run ./my_master.exe , but then the master >>>>>>>>>>>>>>>>> process >>>>>>>>>>>>>>>> doesn't >>>>>>>>>>>>>>>>> know about the available in the network hosts. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> You can set the hostfile parameter in your environment instead >>>>>>>>>>>>>>>> of on the >>>>>>>>>>>>>>>> command line. Just set OMPI_MCA_rds_hostfile_path = my.hosts. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> You can then just run ./my_master.exe on the host where you >>>>>>>>>>>>>>>> want the master >>>>>>>>>>>>>>>> to reside - everything should work the same. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Just as an FYI: the name of that environmental variable is >>>>>>>>>>>>>>>> going to change >>>>>>>>>>>>>>>> in the 1.3 release, but everything will still work the same. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hope that helps >>>>>>>>>>>>>>>> Ralph >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thanks and regards, >>>>>>>>>>>>>>>>> Elena >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> -----Original Message----- >>>>>>>>>>>>>>>>> From: Ralph H Castain [mailto:r...@lanl.gov] >>>>>>>>>>>>>>>>> Sent: Monday, December 17, 2007 5:49 PM >>>>>>>>>>>>>>>>> To: Open MPI Users <us...@open-mpi.org>; Elena Zhebel >>>>>>>>>>>>>>>>> Cc: Ralph H Castain >>>>>>>>>>>>>>>>> Subject: Re: [OMPI users] MPI::Intracomm::Spawn and cluster >>>>>>>>>>>>>>>>> configuration >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On 12/17/07 8:19 AM, "Elena Zhebel" <ezhe...@fugro-jason.com> >>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Hello Ralph, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Thank you for your answer. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> I'm using OpenMPI 1.2.3. , compiler glibc232, Linux Suse >>>>>>>>>>>>>>>>>> 10.0. >>>>>>>>>>>>>>>>>> My "master" executable runs only on the one local host, then >>>>>>>>>>>>>>>>>> it spawns >>>>>>>>>>>>>>>>>> "slaves" (with MPI::Intracomm::Spawn). >>>>>>>>>>>>>>>>>> My question was: how to determine the hosts where these >>>>>>>>>>>>>>>>>> "slaves" will be >>>>>>>>>>>>>>>>>> spawned? >>>>>>>>>>>>>>>>>> You said: "You have to specify all of the hosts that can be >>>>>>>>>>>>>>>>>> used by >>>>>>>>>>>>>>>>>> your job >>>>>>>>>>>>>>>>>> in the original hostfile". How can I specify the host file? >>>>>>>>>>>>>>>>>> I can not >>>>>>>>>>>>>>>>>> find it >>>>>>>>>>>>>>>>>> in the documentation. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Hmmm...sorry about the lack of documentation. I always >>>>>>>>>>>>>>>>> assumed that the MPI >>>>>>>>>>>>>>>>> folks in the project would document such things since it has >>>>>>>>>>>>>>>>> little to do >>>>>>>>>>>>>>>>> with the underlying run-time, but I guess that fell through >>>>>>>>>>>>>>>>> the cracks. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> There are two parts to your question: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> 1. how to specify the hosts to be used for the entire job. I >>>>>>>>>>>>>>>>> believe that >>>>>>>>>>>>>>>> is >>>>>>>>>>>>>>>>> somewhat covered here: >>>>>>>>>>>>>>>>> http://www.open-mpi.org/faq/?category=running#simple-spmd-run >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> That FAQ tells you what a hostfile should look like, though >>>>>>>>>>>>>>>>> you may already >>>>>>>>>>>>>>>>> know that. Basically, we require that you list -all- of the >>>>>>>>>>>>>>>>> nodes that both >>>>>>>>>>>>>>>>> your master and slave programs will use. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> 2. how to specify which nodes are available for the master, >>>>>>>>>>>>>>>>> and which for >>>>>>>>>>>>>>>>> the slave. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> You would specify the host for your master on the mpirun >>>>>>>>>>>>>>>>> command line with >>>>>>>>>>>>>>>>> something like: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> mpirun -n 1 -hostfile my_hostfile -host my_master_host >>>>>>>>>>>>>>>>> my_master.exe >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> This directs Open MPI to map that specified executable on the >>>>>>>>>>>>>>>>> specified >>>>>>>>>>>>>>>> host >>>>>>>>>>>>>>>>> - note that my_master_host must have been in my_hostfile. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Inside your master, you would create an MPI_Info key "host" >>>>>>>>>>>>>>>>> that has a >>>>>>>>>>>>>>>> value >>>>>>>>>>>>>>>>> consisting of a string "host1,host2,host3" identifying the >>>>>>>>>>>>>>>>> hosts you want >>>>>>>>>>>>>>>>> your slave to execute upon. Those hosts must have been >>>>>>>>>>>>>>>>> included in >>>>>>>>>>>>>>>>> my_hostfile. Include that key in the MPI_Info array passed to >>>>>>>>>>>>>>>>> your Spawn. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> We don't currently support providing a hostfile for the >>>>>>>>>>>>>>>>> slaves (as opposed >>>>>>>>>>>>>>>>> to the host-at-a-time string above). This may become >>>>>>>>>>>>>>>>> available in a future >>>>>>>>>>>>>>>>> release - TBD. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Hope that helps >>>>>>>>>>>>>>>>> Ralph >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Thanks and regards, >>>>>>>>>>>>>>>>>> Elena >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> -----Original Message----- >>>>>>>>>>>>>>>>>> From: users-boun...@open-mpi.org >>>>>>>>>>>>>>>>>> [mailto:users-boun...@open-mpi.org] On >>>>>>>>>>>>>>>>>> Behalf Of Ralph H Castain >>>>>>>>>>>>>>>>>> Sent: Monday, December 17, 2007 3:31 PM >>>>>>>>>>>>>>>>>> To: Open MPI Users <us...@open-mpi.org> >>>>>>>>>>>>>>>>>> Cc: Ralph H Castain >>>>>>>>>>>>>>>>>> Subject: Re: [OMPI users] MPI::Intracomm::Spawn and cluster >>>>>>>>>>>>>>>>>> configuration >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On 12/12/07 5:46 AM, "Elena Zhebel" >>>>>>>>>>>>>>>>>> <ezhe...@fugro-jason.com> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> I'm working on a MPI application where I'm using OpenMPI >>>>>>>>>>>>>>>>>>> instead of >>>>>>>>>>>>>>>>>>> MPICH. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> In my "master" program I call the function >>>>>>>>>>>>>>>>>>> MPI::Intracomm::Spawn which >>>>>>>>>>>>>>>>>> spawns >>>>>>>>>>>>>>>>>>> "slave" processes. It is not clear for me how to spawn the >>>>>>>>>>>>>>>>>>> "slave" >>>>>>>>>>>>>>>>>> processes >>>>>>>>>>>>>>>>>>> over the network. Currently "master" creates "slaves" on >>>>>>>>>>>>>>>>>>> the same >>>>>>>>>>>>>>>>>>> host. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> If I use 'mpirun --hostfile openmpi.hosts' then processes >>>>>>>>>>>>>>>>>>> are spawn >>>>>>>>>>>>>>>>>>> over >>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>> network as expected. But now I need to spawn processes over >>>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>> network >>>>>>>>>>>>>>>>>> from >>>>>>>>>>>>>>>>>>> my own executable using MPI::Intracomm::Spawn, how can I >>>>>>>>>>>>>>>>>>> achieve it? >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> I'm not sure from your description exactly what you are >>>>>>>>>>>>>>>>>> trying to do, >>>>>>>>>>>>>>>>>> nor in >>>>>>>>>>>>>>>>>> what environment this is all operating within or what >>>>>>>>>>>>>>>>>> version of Open >>>>>>>>>>>>>>>>>> MPI >>>>>>>>>>>>>>>>>> you are using. Setting aside the environment and version >>>>>>>>>>>>>>>>>> issue, I'm >>>>>>>>>>>>>>>>>> guessing >>>>>>>>>>>>>>>>>> that you are running your executable over some specified set >>>>>>>>>>>>>>>>>> of hosts, >>>>>>>>>>>>>>>>>> but >>>>>>>>>>>>>>>>>> want to provide a different hostfile that specifies the >>>>>>>>>>>>>>>>>> hosts to be >>>>>>>>>>>>>>>>>> used for >>>>>>>>>>>>>>>>>> the "slave" processes. Correct? >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> If that is correct, then I'm afraid you can't do that in any >>>>>>>>>>>>>>>>>> version >>>>>>>>>>>>>>>>>> of Open >>>>>>>>>>>>>>>>>> MPI today. You have to specify all of the hosts that can be >>>>>>>>>>>>>>>>>> used by >>>>>>>>>>>>>>>>>> your job >>>>>>>>>>>>>>>>>> in the original hostfile. You can then specify a subset of >>>>>>>>>>>>>>>>>> those hosts >>>>>>>>>>>>>>>>>> to be >>>>>>>>>>>>>>>>>> used by your original "master" program, and then specify a >>>>>>>>>>>>>>>>>> different >>>>>>>>>>>>>>>>>> subset >>>>>>>>>>>>>>>>>> to be used by the "slaves" when calling Spawn. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> But the system requires that you tell it -all- of the hosts >>>>>>>>>>>>>>>>>> that are >>>>>>>>>>>>>>>>>> going >>>>>>>>>>>>>>>>>> to be used at the beginning of the job. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> At the moment, there is no plan to remove that requirement, >>>>>>>>>>>>>>>>>> though >>>>>>>>>>>>>>>>>> there has >>>>>>>>>>>>>>>>>> been occasional discussion about doing so at some point in >>>>>>>>>>>>>>>>>> the future. >>>>>>>>>>>>>>>>>> No >>>>>>>>>>>>>>>>>> promises that it will happen, though - managed environments, >>>>>>>>>>>>>>>>>> in >>>>>>>>>>>>>>>>>> particular, >>>>>>>>>>>>>>>>>> currently object to the idea of changing the allocation >>>>>>>>>>>>>>>>>> on-the-fly. We >>>>>>>>>>>>>>>>>> may, >>>>>>>>>>>>>>>>>> though, make a provision for purely hostfile-based >>>>>>>>>>>>>>>>>> environments (i.e., >>>>>>>>>>>>>>>>>> unmanaged) at some time in the future. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Ralph >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Thanks in advance for any help. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Elena >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>>>>> users mailing list >>>>>>>>>>>>>>>>>>> us...@open-mpi.org >>>>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>>>> users mailing list >>>>>>>>>>>>>>>>>> us...@open-mpi.org >>>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>>>> users mailing list >>>>>>>>>>>>>>>>>> us...@open-mpi.org >>>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>> users mailing list >>>>>>>>>>>>>>> us...@open-mpi.org >>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>> users mailing list >>>>>>>>>>>>>> us...@open-mpi.org >>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>> users mailing list >>>>>>>>>>>>> us...@open-mpi.org >>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>> users mailing list >>>>>>>>>>>> us...@open-mpi.org >>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> users mailing list >>>>>>>>>>> us...@open-mpi.org >>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>> _______________________________________________ >>>>>>>>>> users mailing list >>>>>>>>>> us...@open-mpi.org >>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>> >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> users mailing list >>>>>>>>> us...@open-mpi.org >>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>> _______________________________________________ >>>>>>>> users mailing list >>>>>>>> us...@open-mpi.org >>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> users mailing list >>>>>>> us...@open-mpi.org >>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>> >>>>>> _______________________________________________ >>>>>> users mailing list >>>>>> us...@open-mpi.org >>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>> >>>>> >>>>> _______________________________________________ >>>>> users mailing list >>>>> us...@open-mpi.org >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users