Re: [OMPI users] General ORTE questions

Ralph Castain Thu, 30 Mar 2006 16:51:25 -0500

Hi Rolf

I apologize for the scarce documentation - we are working on it, buthave a ways to go. I've tried to address your questions below. Pleasefeel free to ask more!


Ralph

Rolf Vandevaart wrote:

Greetings:
I am new to the Open MPI world, and I have been trying to get a better
understanding of the ORTE environment.  At this point, I have a few
questions that I was hoping someone could answer.

1. I have heard mention of running the ORTE daemons in persistent mode,
however, I can find no details of how to do this.  Are there arguments

to either orted or mpirun to make this work right?

Normally, we start a persistent daemon with:
orted --seed --persistent --scope=public

This will start the daemon and "daemonize" it so it keeps running untiltold to die. The arguments worth noting are:

(a) --persistent. Tells the daemon to "stay alive" until specificallytold to "die"

(b) --scope=[public, private, exclusive]. This actually pertains to theuniverse, but you'll need to provide it anyway to ensure properconnectivity to anything you try to run. Right now, the daemons defaultto "exclusive", which means nothing can connect to them except theapplication that spawned them - no value to anyone if started with theabove command! Private would exclude them to contact only from you - Ihaven't tested this enough to guarantee its functionality. I usually runthem as "public" since security isn't a big concern right now - all thismeans is that anyone who can read the session directory tree (which isnormally "locked" to only you anyway) would be able to connect to thedaemon.

(c) --seed. Indicates that this daemon is the first one and thereforewill host the data storage for the registry and other central services

(d) --universe=userid@hostname:universe_name. Allows you to name youruniverse to whatever you like. We use this to allow you to have multipleuniverses co-existing but separate - I've been explaining the reasonsfor that elsewhere, but will send them to this list if desired. Youdon't have to provide this, nor do you have to provide all the fields(e.g., you could just say "--universe=foo" to set the universe name).

You can provide the same options to mpirun, if you like - mpirun willsimply start an orted and pass those parameters along, and the ortedwill merrily stay alive after the specified application completes.

2. I stumbled into a binary called orteconsole.  Is this a useful
utility?  I have played with it, but have found no documentation
on it so I am wondering what the state of it is.

I happen to like this utility myself - it allows you to connect to arunning universe (persistent or not - you can use it, for example, toconnect to the universe of a running application and see what is goingon) and explore the OpenRTE internal data structures, issue commands,etc. Not everything is implemented yet - our initial need was just a wayof politely telling persistent daemons to "die" and cleanup afterthemselves. I've forgotten which commands have been implemented, but canlook at it and write a "man" page for it if you like.

3. I have a similar question about orteprobe.  Is this something
we should know about?

Yes and no - there's nothing secret about it. We use it internally toOpenRTE to "probe" a machine and see if we have a daemon/universeoperating on it. Basically, we launch orteprobe on the remote machine -it checks to see if a session directory exists on it, attempts toconnect to any universes it finds, and then reports back on itsfindings. Based on that report, we either launch an orted on the remotemachine (to act as our surrogate so we can launch an application on thatcell) or connect to an existing universe on the remote machine (and thentell it to launch the application for us).

4. Is there an easy way to view the data in the General Purpose
Registry?  This may be related to my first question, in that I
could imagine having persistent daemons and then I would like
to see what is stored in the registry.

Well, yes and no. Ideally, that would be a command from within theorteconsole function, but I don't think that has been implemented yet.I'd be happy to do so, if that is something you would like (shouldn'ttake long at all). There are a set of "dump" functions in the registryAPI for just that purpose. I usually access them via gdb - I attach thedebugger to the orted process, then use the dump functions to output thevalues in the registry.


Not as easy as the orteconsole interface would be, I admit.

5. Is there a way to monitor what processes are running?  For
example, if I am running 3 MPI programs can I run some command

that would tell me this?

Josh has been working on an "orte_ps" command, but I don't think he hasit done yet.

6. From what I can tell, there is no way to specify the slots argument
with the -host argument.  For example, I cannot do this:
mpirun -np 8 -host node1:slots=4,node2:slots=4 a.out
Just wanted to confirm that.

Now that's an interesting question! Since Jeff was the one who wrote allthat "hostfile" stuff, I'll have to defer to him - quick glance at thecode would seem to support your understanding, but I might have missedsomething.

Thanks for any information,
Rolf

Re: [OMPI users] General ORTE questions

Reply via email to