Re: [OMPI users] General ORTE questions

Rolf Vandevaart Fri, 31 Mar 2006 11:28:04 -0500


Hi Ralph:

Thanks for your information. You said I could ask more so I am! Seebelow.


Ralph Castain wrote On 03/30/06 16:51,:

Hi Rolf
I apologize for the scarce documentation - we are working on it, buthave a ways to go. I've tried to address your questions below. Pleasefeel free to ask more!
Ralph

Rolf Vandevaart wrote:
Greetings:
I am new to the Open MPI world, and I have been trying to get a better
understanding of the ORTE environment.  At this point, I have a few
questions that I was hoping someone could answer.

1. I have heard mention of running the ORTE daemons in persistent mode,
however, I can find no details of how to do this.  Are there arguments
to either orted or mpirun to make this work right?
Normally, we start a persistent daemon with:
orted --seed --persistent --scope=public
This will start the daemon and "daemonize" it so it keeps runninguntil told to die. The arguments worth noting are:
(a) --persistent. Tells the daemon to "stay alive" until specificallytold to "die"
(b) --scope=[public, private, exclusive]. This actually pertains tothe universe, but you'll need to provide it anyway to ensure properconnectivity to anything you try to run. Right now, the daemonsdefault to "exclusive", which means nothing can connect to them exceptthe application that spawned them - no value to anyone if started withthe above command! Private would exclude them to contact only from you- I haven't tested this enough to guarantee its functionality. Iusually run them as "public" since security isn't a big concern rightnow - all this means is that anyone who can read the session directorytree (which is normally "locked" to only you anyway) would be able toconnect to the daemon.
(c) --seed. Indicates that this daemon is the first one and thereforewill host the data storage for the registry and other central services
(d) --universe=userid@hostname:universe_name. Allows you to name youruniverse to whatever you like. We use this to allow you to havemultiple universes co-existing but separate - I've been explaining thereasons for that elsewhere, but will send them to this list ifdesired. You don't have to provide this, nor do you have to provideall the fields (e.g., you could just say "--universe=foo" to set theuniverse name).
You can provide the same options to mpirun, if you like - mpirun willsimply start an orted and pass those parameters along, and the ortedwill merrily stay alive after the specified application completes.

While I understand all that has been written here in theory, I am stillstruggling

to get things to work.

The persistent daemon seems to be ignored when I do an mpirun. I havewatched thesystem calls and looked at the process tree, and the persistent daemondoes not seemto be part of the fun. So, I will be specific about what I am doing,and maybe you can point

out what I am doing wrong.

I have a 3 node cluster. ct2, ct4, and ct5. I am launching the jobfrom ct2 and trying torun on ct4 and ct5 which have persistent daemons on them. I haveselected the daemon

on ct4 to be the seed.

ct4> orted --seed --persistent --scope public -universe foo
ct5> orted --persistent --scope public -universe foo

ct2> mpirun --mca pls_rsh_agent rsh -np 4 -host ct4,ct5 -universe foomy_connectivity -v


While the program is running, I see this on ct4 and ct5.

ps -ef | grep orted

rolfv 9456 1 0 11:24:26 ? 0:00 orted --bootproxy 1--name 0.0.2 --num_procs 3 --vpid_start 0 --nodename ct4rolfv 9386 1 0 11:21:30 ? 0:00 orted --seed--persistent --scope public --universe foo


Thanks for any additional details.


*snip*

3. I have a similar question about orteprobe.  Is this something
we should know about?
Yes and no - there's nothing secret about it. We use it internally toOpenRTE to "probe" a machine and see if we have a daemon/universeoperating on it. Basically, we launch orteprobe on the remote machine- it checks to see if a session directory exists on it, attempts toconnect to any universes it finds, and then reports back on itsfindings. Based on that report, we either launch an orted on theremote machine (to act as our surrogate so we can launch anapplication on that cell) or connect to an existing universe on theremote machine (and then tell it to launch the application for us).
4. Is there an easy way to view the data in the General Purpose
Registry?  This may be related to my first question, in that I
could imagine having persistent daemons and then I would like
to see what is stored in the registry.
Well, yes and no. Ideally, that would be a command from within theorteconsole function, but I don't think that has been implemented yet.I'd be happy to do so, if that is something you would like (shouldn'ttake long at all). There are a set of "dump" functions in the registryAPI for just that purpose. I usually access them via gdb - I attachthe debugger to the orted process, then use the dump functions tooutput the values in the registry.


What exactly do you type in for the dump functions?  I saw these functions,
but could not get them to fire properly.

*snip*

Regards,
Rolf

--

=========================
rolf.vandeva...@sun.com
781-442-3043
=========================

Re: [OMPI users] General ORTE questions

Reply via email to