not off the top of my head. However, as noted earlier, there is absolutely no 
advantage to a singleton vs mpirun start - all the singleton does is 
immediately fork/exec "mpirun" to support the rest of the job. In both cases, 
you have a daemon running the job - only difference is in the number of 
characters the user types to start it.


On Aug 30, 2012, at 8:44 AM, Brian Budge <brian.bu...@gmail.com> wrote:

> In the event that I need to get this up-and-running soon (I do need
> something working within 2 weeks), can you recommend an older version
> where this is expected to work?
> 
> Thanks,
>  Brian
> 
> On Tue, Aug 28, 2012 at 4:58 PM, Brian Budge <brian.bu...@gmail.com> wrote:
>> Thanks!
>> 
>> On Tue, Aug 28, 2012 at 4:57 PM, Ralph Castain <r...@open-mpi.org> wrote:
>>> Yeah, I'm seeing the hang as well when running across multiple machines. 
>>> Let me dig a little and get this fixed.
>>> 
>>> Thanks
>>> Ralph
>>> 
>>> On Aug 28, 2012, at 4:51 PM, Brian Budge <brian.bu...@gmail.com> wrote:
>>> 
>>>> Hmmm, I went to the build directories of openmpi for my two machines,
>>>> went into the orte/test/mpi directory and made the executables on both
>>>> machines.  I set the hostsfile in the env variable on the "master"
>>>> machine.
>>>> 
>>>> Here's the output:
>>>> 
>>>> OMPI_MCA_orte_default_hostfile=/home/budgeb/p4/pseb/external/install/openmpi-1.6.1/orte/test/mpi/hostsfile
>>>> ./simple_spawn
>>>> Parent [pid 97504] starting up!
>>>> 0 completed MPI_Init
>>>> Parent [pid 97504] about to spawn!
>>>> Parent [pid 97507] starting up!
>>>> Parent [pid 97508] starting up!
>>>> Parent [pid 30626] starting up!
>>>> ^C
>>>> zsh: interrupt  OMPI_MCA_orte_default_hostfile= ./simple_spawn
>>>> 
>>>> I had to ^C to kill the hung process.
>>>> 
>>>> When I run using mpirun:
>>>> 
>>>> OMPI_MCA_orte_default_hostfile=/home/budgeb/p4/pseb/external/install/openmpi-1.6.1/orte/test/mpi/hostsfile
>>>> mpirun -np 1 ./simple_spawn
>>>> Parent [pid 97511] starting up!
>>>> 0 completed MPI_Init
>>>> Parent [pid 97511] about to spawn!
>>>> Parent [pid 97513] starting up!
>>>> Parent [pid 30762] starting up!
>>>> Parent [pid 30764] starting up!
>>>> Parent done with spawn
>>>> Parent sending message to child
>>>> 1 completed MPI_Init
>>>> Hello from the child 1 of 3 on host budgeb-sandybridge pid 97513
>>>> 0 completed MPI_Init
>>>> Hello from the child 0 of 3 on host budgeb-interlagos pid 30762
>>>> 2 completed MPI_Init
>>>> Hello from the child 2 of 3 on host budgeb-interlagos pid 30764
>>>> Child 1 disconnected
>>>> Child 0 received msg: 38
>>>> Child 0 disconnected
>>>> Parent disconnected
>>>> Child 2 disconnected
>>>> 97511: exiting
>>>> 97513: exiting
>>>> 30762: exiting
>>>> 30764: exiting
>>>> 
>>>> As you can see, I'm using openmpi v 1.6.1.  I just barely freshly
>>>> installed on both machines using the default configure options.
>>>> 
>>>> Thanks for all your help.
>>>> 
>>>> Brian
>>>> 
>>>> On Tue, Aug 28, 2012 at 4:39 PM, Ralph Castain <r...@open-mpi.org> wrote:
>>>>> Looks to me like it didn't find your executable - could be a question of 
>>>>> where it exists relative to where you are running. If you look in your 
>>>>> OMPI source tree at the orte/test/mpi directory, you'll see an example 
>>>>> program "simple_spawn.c" there. Just "make simple_spawn" and execute that 
>>>>> with your default hostfile set - does it work okay?
>>>>> 
>>>>> It works fine for me, hence the question.
>>>>> 
>>>>> Also, what OMPI version are you using?
>>>>> 
>>>>> On Aug 28, 2012, at 4:25 PM, Brian Budge <brian.bu...@gmail.com> wrote:
>>>>> 
>>>>>> I see.  Okay.  So, I just tried removing the check for universe size,
>>>>>> and set the universe size to 2.  Here's my output:
>>>>>> 
>>>>>> LD_LIBRARY_PATH=/home/budgeb/p4/pseb/external/lib.dev:/usr/local/lib
>>>>>> OMPI_MCA_orte_default_hostfile=`pwd`/hostsfile ./master_exe
>>>>>> [budgeb-interlagos:29965] [[4156,0],0] ORTE_ERROR_LOG: Fatal in file
>>>>>> base/plm_base_receive.c at line 253
>>>>>> [budgeb-interlagos:29963] [[4156,1],0] ORTE_ERROR_LOG: The specified
>>>>>> application failed to start in file dpm_orte.c at line 785
>>>>>> 
>>>>>> The corresponding run with mpirun still works.
>>>>>> 
>>>>>> Thanks,
>>>>>> Brian
>>>>>> 
>>>>>> On Tue, Aug 28, 2012 at 2:46 PM, Ralph Castain <r...@open-mpi.org> wrote:
>>>>>>> I see the issue - it's here:
>>>>>>> 
>>>>>>>> MPI_Attr_get(MPI_COMM_WORLD, MPI_UNIVERSE_SIZE, &puniverseSize, &flag);
>>>>>>>> 
>>>>>>>> if(!flag) {
>>>>>>>>    std::cerr << "no universe size" << std::endl;
>>>>>>>>    return -1;
>>>>>>>> }
>>>>>>>> universeSize = *puniverseSize;
>>>>>>>> if(universeSize == 1) {
>>>>>>>>    std::cerr << "cannot start slaves... not enough nodes" << std::endl;
>>>>>>>> }
>>>>>>> 
>>>>>>> The universe size is set to 1 on a singleton because the attribute gets 
>>>>>>> set at the beginning of time - we haven't any way to go back and change 
>>>>>>> it. The sequence of events explains why. The singleton starts up and 
>>>>>>> sets its attributes, including universe_size. It also spins off an orte 
>>>>>>> daemon to act as its own private "mpirun" in case you call comm_spawn. 
>>>>>>> At this point, however, no hostfile has been read - the singleton is 
>>>>>>> just an MPI proc doing its own thing, and the orte daemon is just 
>>>>>>> sitting there on "stand-by".
>>>>>>> 
>>>>>>> When your app calls comm_spawn, then the orte daemon gets called to 
>>>>>>> launch the new procs. At that time, it (not the original singleton!) 
>>>>>>> reads the hostfile to find out how many nodes are around, and then does 
>>>>>>> the launch.
>>>>>>> 
>>>>>>> You are trying to check the number of nodes from within the singleton, 
>>>>>>> which won't work - it has no way of discovering that info.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> On Aug 28, 2012, at 2:38 PM, Brian Budge <brian.bu...@gmail.com> wrote:
>>>>>>> 
>>>>>>>>> echo hostsfile
>>>>>>>> localhost
>>>>>>>> budgeb-sandybridge
>>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> Brian
>>>>>>>> 
>>>>>>>> On Tue, Aug 28, 2012 at 2:36 PM, Ralph Castain <r...@open-mpi.org> 
>>>>>>>> wrote:
>>>>>>>>> Hmmm...what is in your "hostsfile"?
>>>>>>>>> 
>>>>>>>>> On Aug 28, 2012, at 2:33 PM, Brian Budge <brian.bu...@gmail.com> 
>>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> Hi Ralph -
>>>>>>>>>> 
>>>>>>>>>> Thanks for confirming this is possible.  I'm trying this and 
>>>>>>>>>> currently
>>>>>>>>>> failing.  Perhaps there's something I'm missing in the code to make
>>>>>>>>>> this work.  Here are the two instantiations and their outputs:
>>>>>>>>>> 
>>>>>>>>>>> LD_LIBRARY_PATH=/home/budgeb/p4/pseb/external/lib.dev:/usr/local/lib
>>>>>>>>>>>  OMPI_MCA_orte_default_hostfile=`pwd`/hostsfile ./master_exe
>>>>>>>>>> cannot start slaves... not enough nodes
>>>>>>>>>> 
>>>>>>>>>>> LD_LIBRARY_PATH=/home/budgeb/p4/pseb/external/lib.dev:/usr/local/lib
>>>>>>>>>>>  OMPI_MCA_orte_default_hostfile=`pwd`/hostsfile mpirun -n 1 
>>>>>>>>>>> ./master_exe
>>>>>>>>>> master spawned 1 slaves...
>>>>>>>>>> slave responding...
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> The code:
>>>>>>>>>> 
>>>>>>>>>> //master.cpp
>>>>>>>>>> #include <mpi.h>
>>>>>>>>>> #include <boost/filesystem.hpp>
>>>>>>>>>> #include <iostream>
>>>>>>>>>> 
>>>>>>>>>> int main(int argc, char **args) {
>>>>>>>>>> int worldSize, universeSize, *puniverseSize, flag;
>>>>>>>>>> 
>>>>>>>>>> MPI_Comm everyone; //intercomm
>>>>>>>>>> boost::filesystem::path curPath =
>>>>>>>>>> boost::filesystem::absolute(boost::filesystem::current_path());
>>>>>>>>>> 
>>>>>>>>>> std::string toRun = (curPath / "slave_exe").string();
>>>>>>>>>> 
>>>>>>>>>> int ret = MPI_Init(&argc, &args);
>>>>>>>>>> 
>>>>>>>>>> if(ret != MPI_SUCCESS) {
>>>>>>>>>>    std::cerr << "failed init" << std::endl;
>>>>>>>>>>    return -1;
>>>>>>>>>> }
>>>>>>>>>> 
>>>>>>>>>> MPI_Comm_size(MPI_COMM_WORLD, &worldSize);
>>>>>>>>>> 
>>>>>>>>>> if(worldSize != 1) {
>>>>>>>>>>    std::cerr << "too many masters" << std::endl;
>>>>>>>>>> }
>>>>>>>>>> 
>>>>>>>>>> MPI_Attr_get(MPI_COMM_WORLD, MPI_UNIVERSE_SIZE, &puniverseSize, 
>>>>>>>>>> &flag);
>>>>>>>>>> 
>>>>>>>>>> if(!flag) {
>>>>>>>>>>    std::cerr << "no universe size" << std::endl;
>>>>>>>>>>    return -1;
>>>>>>>>>> }
>>>>>>>>>> universeSize = *puniverseSize;
>>>>>>>>>> if(universeSize == 1) {
>>>>>>>>>>    std::cerr << "cannot start slaves... not enough nodes" << 
>>>>>>>>>> std::endl;
>>>>>>>>>> }
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> char *buf = (char*)alloca(toRun.size() + 1);
>>>>>>>>>> memcpy(buf, toRun.c_str(), toRun.size());
>>>>>>>>>> buf[toRun.size()] = '\0';
>>>>>>>>>> 
>>>>>>>>>> MPI_Comm_spawn(buf, MPI_ARGV_NULL, universeSize-1, MPI_INFO_NULL,
>>>>>>>>>> 0, MPI_COMM_SELF, &everyone,
>>>>>>>>>>               MPI_ERRCODES_IGNORE);
>>>>>>>>>> 
>>>>>>>>>> std::cerr << "master spawned " << universeSize-1 << " slaves..."
>>>>>>>>>> << std::endl;
>>>>>>>>>> 
>>>>>>>>>> MPI_Finalize();
>>>>>>>>>> 
>>>>>>>>>> return 0;
>>>>>>>>>> }
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> //slave.cpp
>>>>>>>>>> #include <mpi.h>
>>>>>>>>>> 
>>>>>>>>>> int main(int argc, char **args) {
>>>>>>>>>> int size;
>>>>>>>>>> MPI_Comm parent;
>>>>>>>>>> MPI_Init(&argc, &args);
>>>>>>>>>> 
>>>>>>>>>> MPI_Comm_get_parent(&parent);
>>>>>>>>>> 
>>>>>>>>>> if(parent == MPI_COMM_NULL) {
>>>>>>>>>>    std::cerr << "slave has no parent" << std::endl;
>>>>>>>>>> }
>>>>>>>>>> MPI_Comm_remote_size(parent, &size);
>>>>>>>>>> if(size != 1) {
>>>>>>>>>>    std::cerr << "parent size is " << size << std::endl;
>>>>>>>>>> }
>>>>>>>>>> 
>>>>>>>>>> std::cerr << "slave responding..." << std::endl;
>>>>>>>>>> 
>>>>>>>>>> MPI_Finalize();
>>>>>>>>>> 
>>>>>>>>>> return 0;
>>>>>>>>>> }
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Any ideas?  Thanks for any help.
>>>>>>>>>> 
>>>>>>>>>> Brian
>>>>>>>>>> 
>>>>>>>>>> On Wed, Aug 22, 2012 at 9:03 AM, Ralph Castain <r...@open-mpi.org> 
>>>>>>>>>> wrote:
>>>>>>>>>>> It really is just that simple :-)
>>>>>>>>>>> 
>>>>>>>>>>> On Aug 22, 2012, at 8:56 AM, Brian Budge <brian.bu...@gmail.com> 
>>>>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> Okay.  Is there a tutorial or FAQ for setting everything up?  Or 
>>>>>>>>>>>> is it
>>>>>>>>>>>> really just that simple?  I don't need to run a copy of the orte
>>>>>>>>>>>> server somewhere?
>>>>>>>>>>>> 
>>>>>>>>>>>> if my current ip is 192.168.0.1,
>>>>>>>>>>>> 
>>>>>>>>>>>> 0 > echo 192.168.0.11 > /tmp/hostfile
>>>>>>>>>>>> 1 > echo 192.168.0.12 >> /tmp/hostfile
>>>>>>>>>>>> 2 > export OMPI_MCA_orte_default_hostfile=/tmp/hostfile
>>>>>>>>>>>> 3 > ./mySpawningExe
>>>>>>>>>>>> 
>>>>>>>>>>>> At this point, mySpawningExe will be the master, running on
>>>>>>>>>>>> 192.168.0.1, and I can have spawned, for example, childExe on
>>>>>>>>>>>> 192.168.0.11 and 192.168.0.12?  Or childExe1 on 192.168.0.11 and
>>>>>>>>>>>> childExe2 on 192.168.0.12?
>>>>>>>>>>>> 
>>>>>>>>>>>> Thanks for the help.
>>>>>>>>>>>> 
>>>>>>>>>>>> Brian
>>>>>>>>>>>> 
>>>>>>>>>>>> On Wed, Aug 22, 2012 at 7:15 AM, Ralph Castain <r...@open-mpi.org> 
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>> Sure, that's still true on all 1.3 or above releases. All you 
>>>>>>>>>>>>> need to do is set the hostfile envar so we pick it up:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> OMPI_MCA_orte_default_hostfile=<foo>
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Aug 21, 2012, at 7:23 PM, Brian Budge <brian.bu...@gmail.com> 
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Hi.  I know this is an old thread, but I'm curious if there are 
>>>>>>>>>>>>>> any
>>>>>>>>>>>>>> tutorials describing how to set this up?  Is this still 
>>>>>>>>>>>>>> available on
>>>>>>>>>>>>>> newer open mpi versions?
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> Brian
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Fri, Jan 4, 2008 at 7:57 AM, Ralph Castain <r...@lanl.gov> 
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>> Hi Elena
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> I'm copying this to the user list just to correct a 
>>>>>>>>>>>>>>> mis-statement on my part
>>>>>>>>>>>>>>> in an earlier message that went there. I had stated that a 
>>>>>>>>>>>>>>> singleton could
>>>>>>>>>>>>>>> comm_spawn onto other nodes listed in a hostfile by setting an 
>>>>>>>>>>>>>>> environmental
>>>>>>>>>>>>>>> variable that pointed us to the hostfile.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> This is incorrect in the 1.2 code series. That series does not 
>>>>>>>>>>>>>>> allow
>>>>>>>>>>>>>>> singletons to read a hostfile at all. Hence, any comm_spawn 
>>>>>>>>>>>>>>> done by a
>>>>>>>>>>>>>>> singleton can only launch child processes on the singleton's 
>>>>>>>>>>>>>>> local host.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> This situation has been corrected for the upcoming 1.3 code 
>>>>>>>>>>>>>>> series. For the
>>>>>>>>>>>>>>> 1.2 series, though, you will have to do it via an mpirun 
>>>>>>>>>>>>>>> command line.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Sorry for the confusion - I sometimes have too many code 
>>>>>>>>>>>>>>> families to keep
>>>>>>>>>>>>>>> straight in this old mind!
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Ralph
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On 1/4/08 5:10 AM, "Elena Zhebel" <ezhe...@fugro-jason.com> 
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Hello Ralph,
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Thank you very much for the explanations.
>>>>>>>>>>>>>>>> But I still do not get it running...
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> For the case
>>>>>>>>>>>>>>>> mpirun -n 1 -hostfile my_hostfile -host my_master_host 
>>>>>>>>>>>>>>>> my_master.exe
>>>>>>>>>>>>>>>> everything works.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> For the case
>>>>>>>>>>>>>>>> ./my_master.exe
>>>>>>>>>>>>>>>> it does not.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> I did:
>>>>>>>>>>>>>>>> - create my_hostfile and put it in the 
>>>>>>>>>>>>>>>> $HOME/.openmpi/components/
>>>>>>>>>>>>>>>> my_hostfile :
>>>>>>>>>>>>>>>> bollenstreek slots=2 max_slots=3
>>>>>>>>>>>>>>>> octocore01 slots=8  max_slots=8
>>>>>>>>>>>>>>>> octocore02 slots=8  max_slots=8
>>>>>>>>>>>>>>>> clstr000 slots=2 max_slots=3
>>>>>>>>>>>>>>>> clstr001 slots=2 max_slots=3
>>>>>>>>>>>>>>>> clstr002 slots=2 max_slots=3
>>>>>>>>>>>>>>>> clstr003 slots=2 max_slots=3
>>>>>>>>>>>>>>>> clstr004 slots=2 max_slots=3
>>>>>>>>>>>>>>>> clstr005 slots=2 max_slots=3
>>>>>>>>>>>>>>>> clstr006 slots=2 max_slots=3
>>>>>>>>>>>>>>>> clstr007 slots=2 max_slots=3
>>>>>>>>>>>>>>>> - setenv OMPI_MCA_rds_hostfile_path my_hostfile (I  put it in 
>>>>>>>>>>>>>>>> .tcshrc and
>>>>>>>>>>>>>>>> then source .tcshrc)
>>>>>>>>>>>>>>>> - in my_master.cpp I did
>>>>>>>>>>>>>>>> MPI_Info info1;
>>>>>>>>>>>>>>>> MPI_Info_create(&info1);
>>>>>>>>>>>>>>>> char* hostname =
>>>>>>>>>>>>>>>> "clstr002,clstr003,clstr005,clstr006,clstr007,octocore01,octocore02";
>>>>>>>>>>>>>>>> MPI_Info_set(info1, "host", hostname);
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> _intercomm = intracomm.Spawn("./childexe", argv1, _nProc, 
>>>>>>>>>>>>>>>> info1, 0,
>>>>>>>>>>>>>>>> MPI_ERRCODES_IGNORE);
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> - After I call the executable, I've got this error message
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> bollenstreek: > ./my_master
>>>>>>>>>>>>>>>> number of processes to run: 1
>>>>>>>>>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>>>>>>>>> Some of the requested hosts are not included in the current 
>>>>>>>>>>>>>>>> allocation for
>>>>>>>>>>>>>>>> the application:
>>>>>>>>>>>>>>>> ./childexe
>>>>>>>>>>>>>>>> The requested hosts were:
>>>>>>>>>>>>>>>> clstr002,clstr003,clstr005,clstr006,clstr007,octocore01,octocore02
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Verify that you have mapped the allocated resources properly 
>>>>>>>>>>>>>>>> using the
>>>>>>>>>>>>>>>> --host specification.
>>>>>>>>>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>>>>>>>>> [bollenstreek:21443] [0,0,0] ORTE_ERROR_LOG: Out of resource 
>>>>>>>>>>>>>>>> in file
>>>>>>>>>>>>>>>> base/rmaps_base_support_fns.c at line 225
>>>>>>>>>>>>>>>> [bollenstreek:21443] [0,0,0] ORTE_ERROR_LOG: Out of resource 
>>>>>>>>>>>>>>>> in file
>>>>>>>>>>>>>>>> rmaps_rr.c at line 478
>>>>>>>>>>>>>>>> [bollenstreek:21443] [0,0,0] ORTE_ERROR_LOG: Out of resource 
>>>>>>>>>>>>>>>> in file
>>>>>>>>>>>>>>>> base/rmaps_base_map_job.c at line 210
>>>>>>>>>>>>>>>> [bollenstreek:21443] [0,0,0] ORTE_ERROR_LOG: Out of resource 
>>>>>>>>>>>>>>>> in file
>>>>>>>>>>>>>>>> rmgr_urm.c at line 372
>>>>>>>>>>>>>>>> [bollenstreek:21443] [0,0,0] ORTE_ERROR_LOG: Out of resource 
>>>>>>>>>>>>>>>> in file
>>>>>>>>>>>>>>>> communicator/comm_dyn.c at line 608
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Did I miss something?
>>>>>>>>>>>>>>>> Thanks for help!
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Elena
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>>>>>> From: Ralph H Castain [mailto:r...@lanl.gov]
>>>>>>>>>>>>>>>> Sent: Tuesday, December 18, 2007 3:50 PM
>>>>>>>>>>>>>>>> To: Elena Zhebel; Open MPI Users <us...@open-mpi.org>
>>>>>>>>>>>>>>>> Cc: Ralph H Castain
>>>>>>>>>>>>>>>> Subject: Re: [OMPI users] MPI::Intracomm::Spawn and cluster 
>>>>>>>>>>>>>>>> configuration
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On 12/18/07 7:35 AM, "Elena Zhebel" <ezhe...@fugro-jason.com> 
>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Thanks a lot! Now it works!
>>>>>>>>>>>>>>>>> The solution is to use mpirun -n 1 -hostfile my.hosts *.exe 
>>>>>>>>>>>>>>>>> and pass
>>>>>>>>>>>>>>>> MPI_Info
>>>>>>>>>>>>>>>>> Key to the Spawn function!
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> One more question: is it necessary to start my "master" 
>>>>>>>>>>>>>>>>> program with
>>>>>>>>>>>>>>>>> mpirun -n 1 -hostfile my_hostfile -host my_master_host 
>>>>>>>>>>>>>>>>> my_master.exe ?
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> No, it isn't necessary - assuming that my_master_host is the 
>>>>>>>>>>>>>>>> first host
>>>>>>>>>>>>>>>> listed in your hostfile! If you are only executing one 
>>>>>>>>>>>>>>>> my_master.exe (i.e.,
>>>>>>>>>>>>>>>> you gave -n 1 to mpirun), then we will automatically map that 
>>>>>>>>>>>>>>>> process onto
>>>>>>>>>>>>>>>> the first host in your hostfile.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> If you want my_master.exe to go on someone other than the 
>>>>>>>>>>>>>>>> first host in the
>>>>>>>>>>>>>>>> file, then you have to give us the -host option.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Are there other possibilities for easy start?
>>>>>>>>>>>>>>>>> I would say just to run ./my_master.exe , but then the master 
>>>>>>>>>>>>>>>>> process
>>>>>>>>>>>>>>>> doesn't
>>>>>>>>>>>>>>>>> know about the available in the network hosts.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> You can set the hostfile parameter in your environment instead 
>>>>>>>>>>>>>>>> of on the
>>>>>>>>>>>>>>>> command line. Just set OMPI_MCA_rds_hostfile_path = my.hosts.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> You can then just run ./my_master.exe on the host where you 
>>>>>>>>>>>>>>>> want the master
>>>>>>>>>>>>>>>> to reside - everything should work the same.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Just as an FYI: the name of that environmental variable is 
>>>>>>>>>>>>>>>> going to change
>>>>>>>>>>>>>>>> in the 1.3 release, but everything will still work the same.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Hope that helps
>>>>>>>>>>>>>>>> Ralph
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Thanks and regards,
>>>>>>>>>>>>>>>>> Elena
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>>>>>>> From: Ralph H Castain [mailto:r...@lanl.gov]
>>>>>>>>>>>>>>>>> Sent: Monday, December 17, 2007 5:49 PM
>>>>>>>>>>>>>>>>> To: Open MPI Users <us...@open-mpi.org>; Elena Zhebel
>>>>>>>>>>>>>>>>> Cc: Ralph H Castain
>>>>>>>>>>>>>>>>> Subject: Re: [OMPI users] MPI::Intracomm::Spawn and cluster 
>>>>>>>>>>>>>>>>> configuration
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> On 12/17/07 8:19 AM, "Elena Zhebel" <ezhe...@fugro-jason.com> 
>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Hello Ralph,
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Thank you for your answer.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> I'm using OpenMPI 1.2.3. , compiler glibc232, Linux Suse 
>>>>>>>>>>>>>>>>>> 10.0.
>>>>>>>>>>>>>>>>>> My "master" executable runs only on the one local host, then 
>>>>>>>>>>>>>>>>>> it spawns
>>>>>>>>>>>>>>>>>> "slaves" (with MPI::Intracomm::Spawn).
>>>>>>>>>>>>>>>>>> My question was: how to determine the hosts where these 
>>>>>>>>>>>>>>>>>> "slaves" will be
>>>>>>>>>>>>>>>>>> spawned?
>>>>>>>>>>>>>>>>>> You said: "You have to specify all of the hosts that can be 
>>>>>>>>>>>>>>>>>> used by
>>>>>>>>>>>>>>>>>> your job
>>>>>>>>>>>>>>>>>> in the original hostfile". How can I specify the host file? 
>>>>>>>>>>>>>>>>>> I can not
>>>>>>>>>>>>>>>>>> find it
>>>>>>>>>>>>>>>>>> in the documentation.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Hmmm...sorry about the lack of documentation. I always 
>>>>>>>>>>>>>>>>> assumed that the MPI
>>>>>>>>>>>>>>>>> folks in the project would document such things since it has 
>>>>>>>>>>>>>>>>> little to do
>>>>>>>>>>>>>>>>> with the underlying run-time, but I guess that fell through 
>>>>>>>>>>>>>>>>> the cracks.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> There are two parts to your question:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 1. how to specify the hosts to be used for the entire job. I 
>>>>>>>>>>>>>>>>> believe that
>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>> somewhat covered here:
>>>>>>>>>>>>>>>>> http://www.open-mpi.org/faq/?category=running#simple-spmd-run
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> That FAQ tells you what a hostfile should look like, though 
>>>>>>>>>>>>>>>>> you may already
>>>>>>>>>>>>>>>>> know that. Basically, we require that you list -all- of the 
>>>>>>>>>>>>>>>>> nodes that both
>>>>>>>>>>>>>>>>> your master and slave programs will use.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 2. how to specify which nodes are available for the master, 
>>>>>>>>>>>>>>>>> and which for
>>>>>>>>>>>>>>>>> the slave.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> You would specify the host for your master on the mpirun 
>>>>>>>>>>>>>>>>> command line with
>>>>>>>>>>>>>>>>> something like:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> mpirun -n 1 -hostfile my_hostfile -host my_master_host 
>>>>>>>>>>>>>>>>> my_master.exe
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> This directs Open MPI to map that specified executable on the 
>>>>>>>>>>>>>>>>> specified
>>>>>>>>>>>>>>>> host
>>>>>>>>>>>>>>>>> - note that my_master_host must have been in my_hostfile.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Inside your master, you would create an MPI_Info key "host" 
>>>>>>>>>>>>>>>>> that has a
>>>>>>>>>>>>>>>> value
>>>>>>>>>>>>>>>>> consisting of a string "host1,host2,host3" identifying the 
>>>>>>>>>>>>>>>>> hosts you want
>>>>>>>>>>>>>>>>> your slave to execute upon. Those hosts must have been 
>>>>>>>>>>>>>>>>> included in
>>>>>>>>>>>>>>>>> my_hostfile. Include that key in the MPI_Info array passed to 
>>>>>>>>>>>>>>>>> your Spawn.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> We don't currently support providing a hostfile for the 
>>>>>>>>>>>>>>>>> slaves (as opposed
>>>>>>>>>>>>>>>>> to the host-at-a-time string above). This may become 
>>>>>>>>>>>>>>>>> available in a future
>>>>>>>>>>>>>>>>> release - TBD.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Hope that helps
>>>>>>>>>>>>>>>>> Ralph
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Thanks and regards,
>>>>>>>>>>>>>>>>>> Elena
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>>>>>>>> From: users-boun...@open-mpi.org 
>>>>>>>>>>>>>>>>>> [mailto:users-boun...@open-mpi.org] On
>>>>>>>>>>>>>>>>>> Behalf Of Ralph H Castain
>>>>>>>>>>>>>>>>>> Sent: Monday, December 17, 2007 3:31 PM
>>>>>>>>>>>>>>>>>> To: Open MPI Users <us...@open-mpi.org>
>>>>>>>>>>>>>>>>>> Cc: Ralph H Castain
>>>>>>>>>>>>>>>>>> Subject: Re: [OMPI users] MPI::Intracomm::Spawn and cluster
>>>>>>>>>>>>>>>>>> configuration
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> On 12/12/07 5:46 AM, "Elena Zhebel" 
>>>>>>>>>>>>>>>>>> <ezhe...@fugro-jason.com> wrote:
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> I'm working on a MPI application where I'm using OpenMPI 
>>>>>>>>>>>>>>>>>>> instead of
>>>>>>>>>>>>>>>>>>> MPICH.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> In my "master" program I call the function 
>>>>>>>>>>>>>>>>>>> MPI::Intracomm::Spawn which
>>>>>>>>>>>>>>>>>> spawns
>>>>>>>>>>>>>>>>>>> "slave" processes. It is not clear for me how to spawn the 
>>>>>>>>>>>>>>>>>>> "slave"
>>>>>>>>>>>>>>>>>> processes
>>>>>>>>>>>>>>>>>>> over the network. Currently "master" creates "slaves" on 
>>>>>>>>>>>>>>>>>>> the same
>>>>>>>>>>>>>>>>>>> host.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> If I use 'mpirun --hostfile openmpi.hosts' then processes 
>>>>>>>>>>>>>>>>>>> are spawn
>>>>>>>>>>>>>>>>>>> over
>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>> network as expected. But now I need to spawn processes over 
>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>> network
>>>>>>>>>>>>>>>>>> from
>>>>>>>>>>>>>>>>>>> my own executable using MPI::Intracomm::Spawn, how can I 
>>>>>>>>>>>>>>>>>>> achieve it?
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> I'm not sure from your description exactly what you are 
>>>>>>>>>>>>>>>>>> trying to do,
>>>>>>>>>>>>>>>>>> nor in
>>>>>>>>>>>>>>>>>> what environment this is all operating within or what 
>>>>>>>>>>>>>>>>>> version of Open
>>>>>>>>>>>>>>>>>> MPI
>>>>>>>>>>>>>>>>>> you are using. Setting aside the environment and version 
>>>>>>>>>>>>>>>>>> issue, I'm
>>>>>>>>>>>>>>>>>> guessing
>>>>>>>>>>>>>>>>>> that you are running your executable over some specified set 
>>>>>>>>>>>>>>>>>> of hosts,
>>>>>>>>>>>>>>>>>> but
>>>>>>>>>>>>>>>>>> want to provide a different hostfile that specifies the 
>>>>>>>>>>>>>>>>>> hosts to be
>>>>>>>>>>>>>>>>>> used for
>>>>>>>>>>>>>>>>>> the "slave" processes. Correct?
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> If that is correct, then I'm afraid you can't do that in any 
>>>>>>>>>>>>>>>>>> version
>>>>>>>>>>>>>>>>>> of Open
>>>>>>>>>>>>>>>>>> MPI today. You have to specify all of the hosts that can be 
>>>>>>>>>>>>>>>>>> used by
>>>>>>>>>>>>>>>>>> your job
>>>>>>>>>>>>>>>>>> in the original hostfile. You can then specify a subset of 
>>>>>>>>>>>>>>>>>> those hosts
>>>>>>>>>>>>>>>>>> to be
>>>>>>>>>>>>>>>>>> used by your original "master" program, and then specify a 
>>>>>>>>>>>>>>>>>> different
>>>>>>>>>>>>>>>>>> subset
>>>>>>>>>>>>>>>>>> to be used by the "slaves" when calling Spawn.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> But the system requires that you tell it -all- of the hosts 
>>>>>>>>>>>>>>>>>> that are
>>>>>>>>>>>>>>>>>> going
>>>>>>>>>>>>>>>>>> to be used at the beginning of the job.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> At the moment, there is no plan to remove that requirement, 
>>>>>>>>>>>>>>>>>> though
>>>>>>>>>>>>>>>>>> there has
>>>>>>>>>>>>>>>>>> been occasional discussion about doing so at some point in 
>>>>>>>>>>>>>>>>>> the future.
>>>>>>>>>>>>>>>>>> No
>>>>>>>>>>>>>>>>>> promises that it will happen, though - managed environments, 
>>>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>> particular,
>>>>>>>>>>>>>>>>>> currently object to the idea of changing the allocation 
>>>>>>>>>>>>>>>>>> on-the-fly. We
>>>>>>>>>>>>>>>>>> may,
>>>>>>>>>>>>>>>>>> though, make a provision for purely hostfile-based 
>>>>>>>>>>>>>>>>>> environments (i.e.,
>>>>>>>>>>>>>>>>>> unmanaged) at some time in the future.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Ralph
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Thanks in advance for any help.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Elena
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>>>> users mailing list
>>>>>>>>>>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>>> users mailing list
>>>>>>>>>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>>> users mailing list
>>>>>>>>>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>> users mailing list
>>>>>>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>> users mailing list
>>>>>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>> users mailing list
>>>>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> users mailing list
>>>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> users mailing list
>>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>> _______________________________________________
>>>>>>>>>> users mailing list
>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> _______________________________________________
>>>>>>>>> users mailing list
>>>>>>>>> us...@open-mpi.org
>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>> _______________________________________________
>>>>>>>> users mailing list
>>>>>>>> us...@open-mpi.org
>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>> 
>>>>>>> 
>>>>>>> _______________________________________________
>>>>>>> users mailing list
>>>>>>> us...@open-mpi.org
>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>> 
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> us...@open-mpi.org
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> us...@open-mpi.org
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> 
>>>> _______________________________________________
>>>> users mailing list
>>>> us...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> 
>>> 
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


Reply via email to