Am 15.03.2012 um 15:50 schrieb Ralph Castain:

> 
> On Mar 15, 2012, at 8:46 AM, Reuti wrote:
> 
>> Am 15.03.2012 um 15:37 schrieb Ralph Castain:
>> 
>>> Just to be clear: I take it that the first entry is the host name, and the 
>>> second is the number of slots allocated on that host?
>> 
>> This is correct.
>> 
>> 
>>> FWIW: I see the problem. Our parser was apparently written assuming every 
>>> line was a unique host, so it doesn't even check to see if there is 
>>> duplication. Easy fix - can shoot it to you today.
>> 
>> But even with the fix the nice value will be the same for all processes 
>> forked there. Either all have the nice value of his low priority queue or 
>> the high priority queue.
> 
> Agreed - nothing I can do about that, though. We only do the one qrsh call, 
> so the daemons are going to fall into a single queue, and so will all their 
> children. In this scenario, it isn't clear to me (from this discussion) that 
> I can control which queue gets used

Correct.


> - can I?

No. As posted I created an issue for it. But if it would work, then you would 
get already different $TMPDIRs for each queue.


> Should I?

I can't speak for the community. Personally I would say: don't distribute 
parallel jobs among different queues at all, as some applications will use some 
internal communication about the environment variables of the master process to 
distribute them to the slaves (even if SGE's `qrsh -inherit ...` is called 
without -V, and even if Open MPI is not told to forward and specific 
environment variable). If you have a custom application it can work of course, 
but with closed source ones you can only test and get the experience whether 
it's working or not.

Not to mention the timing issue of differently niced processes. Adjusting the 
SGE setup of the OP would be the smarter way IMO.

If it's fixed in Open MPI to add up all the granted slots on one machine, some 
users may think it's an Open MPI error to attach all to one queue only, as they 
expect different queues to be used. So this "workaround" should be noted 
somewhere: >>As it's not possible the reach a specific queue on a slave machine 
by SGE's tight integration commands (`qrsh -inherit ...`), as a workaround the 
number of slots across different queues are added up inside the $PE_HOSTFILE of 
SGE and started in the queue SGE choses for the first issued `qrsh -inherit 
...`. Which one is taken can't be predicted though.<<

-- Reuti


>>> On Mar 15, 2012, at 6:53 AM, Reuti wrote:
>>> 
>>>> Am 15.03.2012 um 05:22 schrieb Joshua Baker-LePain:
>>>> 
>>>>> On Wed, 14 Mar 2012 at 5:50pm, Ralph Castain wrote
>>>>> 
>>>>>> On Mar 14, 2012, at 5:44 PM, Reuti wrote:
>>>>> 
>>>>>>> (I was just typing when Ralph's message came in: I can confirm this. To 
>>>>>>> avoid it, it would mean for Open MPI to collect all lines from the 
>>>>>>> hostfile which are on the same machine. SGE creates entries for each 
>>>>>>> queue/host pair in the machine file).
>>>>>> 
>>>>>> Hmmm…I can take a look at the allocator module and see why we aren't 
>>>>>> doing it. Would the host names be the same for the two queues?
>>>>> 
>>>>> I can't speak authoritatively like Reuti can, but here's what a hostfile
>>>>> looks like on my cluster (note that all our name resolution is done via 
>>>>> /etc/hosts -- there's no DNS involved):
>>>>> 
>>>>> iq103 8 lab.q@iq103 <NULL>
>>>>> iq103 1 test.q@iq103 <NULL>
>>>>> iq104 8 lab.q@iq104 <NULL>
>>>>> iq104 1 test.q@iq104 <NULL>
>>>>> opt221 2 lab.q@opt221 <NULL>
>>>>> opt221 1 test.q@opt221 <NULL>
>>>> 
>>>> Yes, exactly this needs to be parsed and adding up all entries therein for 
>>>> one and the same machine.
>>>> 
>>>> If you need it instantly, it could be put in a wrapper for start_proc_args 
>>>> of the PE (and Open MPI compiled without SGE support), so that a custom 
>>>> build machinefile can be used. In this case the rsh resp. ssh call also 
>>>> needs to be caught.
>>>> 
>>>> Often the opposite is desired in an SGE setup: tune it so that all slots 
>>>> are coming from one queue only.
>>>> 
>>>> But I still wonder whether it is possible to tune your setup in a similar 
>>>> way: allow one slot more in the high priority queue (long,.q) in case it's 
>>>> a parallel job, with an RQS (assuming 8 cores with one core 
>>>> oversubscription):
>>>> 
>>>> limit queues long.q pes * to slots=9
>>>> limit queues long.q to slots=8
>>>> 
>>>> while you have an additonal short.q (the low priority queue) there with 
>>>> one slot. The overall limit is still set on an exechost level to 9. The PE 
>>>> is then only attached to long.q.
>>>> 
>>>> -- Reuti
>>>> 
>>>> PS: In your example you also had the case 2 slots in the low priority 
>>>> queue, what is the actual setup in your cluster?
>>>> 
>>>> 
>>>>>>> @Ralph: it could work if SGE would have a facility to request the 
>>>>>>> desired queue in `qrsh -inherit ...`, because then the $TMPDIR would be 
>>>>>>> unique for each orted again (assuming its using different ports for 
>>>>>>> each).
>>>>>> 
>>>>>> Gotcha! I suspect getting the allocator to handle this cleanly is the 
>>>>>> better solution, though.
>>>>> 
>>>>> If I can help (testing patches, e.g.), let me know.
>>>>> 
>>>>> -- 
>>>>> Joshua Baker-LePain
>>>>> QB3 Shared Cluster Sysadmin
>>>>> UCSF_______________________________________________
>>>>> users mailing list
>>>>> us...@open-mpi.org
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> 
>>>> 
>>>> _______________________________________________
>>>> users mailing list
>>>> us...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> 
>>> 
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> 
>> 
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 


Reply via email to