Am 14.03.2012 um 17:44 schrieb Ralph Castain:

> Hi Reuti
> 
> I appreciate your help on this thread - I confess I'm puzzled by it. As you 
> know, OMPI doesn't use SGE to launch the individual processes, nor does SGE 
> even know they exist. All SGE is used for is to launch the OMPI daemons 
> (orteds). This is done as a single qrsh call, so won't all the daemons wind 
> up being executed against the same queue regardless of how many queues exist 
> in the system?

Yes, per machine they will then start in one queue (the one the first and only 
`qrsh -inherit ...` will be assigned to). But between machines, they can get 
different queues. I would also assume that this is not relevant to Open MPI. 
You could say it's a cosmectic flaw, but it's worth to be noted as some 
applications expect the same $TMPDIR to be present on all machines with exactly 
the same name, and this can't be guranteed in case different queues were used 
for a job.

> 
> Given that the daemons then fork/exec the MPI processes (outside of qrsh), I 
> would think they would inherit that nice setting as well, and so all the 
> procs will be running at the same nice level too.
> 
> As for TMPDIR, we don't forward that unless specifically directed to do so, 
> which I didn't see on their cmd line.

The SGE integration of Open MPI will forward all variables from the master task 
to all nodes by the supplied -V option in the Open MPI source. But for TMPDIR 
it won't do any harm, as SGE will overide this with the real $TMPDIR according 
to the selected queue on each particular slave machine again.

If now this is again overriden by the application by any distribution of a 
variable to the slaves, then it can fail as the expected $TMPDIR isn't there. 
As said: maybe it's unrelated to the issue.

I just tested with two different queues on two machines and a small mpihello 
and it is working as expected.

Joshua: the Centos6 is the same on all nodes and the you recompiled the 
application with the actual version of the library? By "threads" you refer to 
"processes"?

-- Reuti


> On Mar 14, 2012, at 2:33 AM, Reuti wrote:
> 
>> Hi,
>> 
>> Am 14.03.2012 um 04:02 schrieb Joshua Baker-LePain:
>> 
>>> On Tue, 13 Mar 2012 at 5:31pm, Ralph Castain wrote
>>> 
>>>> FWIW: I have a Centos6 system myself, and I have no problems running OMPI 
>>>> on it (1.4 or 1.5). I can try building it the same way you do and see what 
>>>> happens.
>>> 
>>> I can run as many threads as I like on a single system with no problems, 
>>> even if those threads are running at different nice levels.
>> 
>> How do they get different nice levels - you renice them? I would assume that 
>> all start at the same of the parent. In your test program you posted there 
>> are no threads.
>> 
>> 
>>> The problem seems to arise when I'm both a) running across multiple 
>>> machines and b) running threads at differing nice levels (which often 
>>> happens as a result of our queueing setup).
>> 
>> This sounds like you are getting slots from different queues assigned to one 
>> and the same job. My experience: don't do it, unless you neeed it. The 
>> problem is, that SGE can't decide in its `qrsh -inherit ...` call, which 
>> queue is the correct one for the particular call. As a result all calls to a 
>> slave machine can end up in one and the same queue. Although this is not 
>> correct, it won't oversubscribe the node, as usually the overall slot amount 
>> is limited already and it's more a matter of names SGE sets for the 
>> environment of the job:
>> 
>> https://arc.liv.ac.uk/trac/SGE/ticket/813
>> 
>> As a result, the SGE set $TMPDIR can be different between the master of the 
>> parallel job and the slave as the name of the queue is part of $TMPDIR. When 
>> a wrong $TMPDIR is set on a node (by Open MPI's forwarding?), strange things 
>> can happen depending on the application.
>> 
>> Do you face the same if you stay in one and the same queue across the 
>> machines? If you want to limit the number of available PEs in your setup for 
>> the user, you could request a PE by a wildcard and once a PE is selected SGE 
>> will stay in this PE. Attaching each PE to only one queue allows this way to 
>> avoid the mixture of slots from different queues (orte1 PE => all.q, orte2 
>> PE => extra.q and you request orte*).
>> 
>> -- Reuti
>> 
>> 
>>> I can't guarantee that the problem *never* happens when I run across 
>>> multiple machines with all the threads un-niced, but I haven't been able to 
>>> reproduce that at will like I can for the other case.
>>> 
>>> -- 
>>> Joshua Baker-LePain
>>> QB3 Shared Cluster Sysadmin
>>> UCSF
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


Reply via email to