Re: [gridengine users] CPU complex

Reuti Fri, 09 Dec 2016 13:43:19 -0800

Am 09.12.2016 um 10:36 schrieb John_Tai:

> 8 slots:
> 
> # qstat -f
> queuename                      qtype resv/used/tot. load_avg arch          
> states
> ---------------------------------------------------------------------------------
> all.q@ibm021                   BIP   0/0/8          0.02     lx-amd64
> ---------------------------------------------------------------------------------
> all.q@ibm037                   BIP   0/0/8          0.00     lx-amd64
> ---------------------------------------------------------------------------------
> all.q@ibm038                   BIP   0/0/8          0.00     lx-amd64
> ---------------------------------------------------------------------------------
> pc.q@ibm021                    BIP   0/0/1          0.02     lx-amd64
> ---------------------------------------------------------------------------------
> sim.q@ibm021                   BIP   0/0/1          0.02     lx-amd64


Is there any limit of slots in the exechost defined, or in an RQS?

-- Reuti


> 
> ############################################################################
> - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS
> ############################################################################
>     89 0.55500 xclock     johnt        qw    12/09/2016 15:14:25     2
> 
> 
> 
> -----Original Message-----
> From: Reuti [mailto:re...@staff.uni-marburg.de]
> Sent: Friday, December 09, 2016 3:46
> To: John_Tai
> Cc: users@gridengine.org
> Subject: Re: [gridengine users] CPU complex
> 
> Hi,
> 
> Am 09.12.2016 um 08:20 schrieb John_Tai:
> 
>> I've setup PE but I'm having problems submitting jobs.
>> 
>> - Here's the PE I created:
>> 
>> # qconf -sp cores
>> pe_name            cores
>> slots              999
>> user_lists         NONE
>> xuser_lists        NONE
>> start_proc_args    /bin/true
>> stop_proc_args     /bin/true
>> allocation_rule    $pe_slots
>> control_slaves     FALSE
>> job_is_first_task  TRUE
>> urgency_slots      min
>> accounting_summary FALSE
>> qsort_args         NONE
>> 
>> - I've then added this to all.q:
>> 
>> qconf -aattr queue pe_list cores all.q
> 
> How many "slots" were defined in there queue definition for all.q?
> 
> -- Reuti
> 
> 
>> - Now I submit a job:
>> 
>> # qsub -V -b y -cwd -now n -pe cores 2 -q all.q@ibm038 xclock Your job
>> 89 ("xclock") has been submitted # qstat
>> job-ID  prior   name       user         state submit/start at     queue      
>>                     slots ja-task-ID
>> -----------------------------------------------------------------------------------------------------------------
>>    89 0.00000 xclock     johnt        qw    12/09/2016 15:14:25              
>>                       2
>> # qalter -w p 89
>> Job 89 cannot run in PE "cores" because it only offers 0 slots
>> verification: no suitable queues
>> # qstat -f
>> queuename                      qtype resv/used/tot. load_avg arch          
>> states
>> ---------------------------------------------------------------------------------
>> all.q@ibm038                   BIP   0/0/8          0.00     lx-amd64
>> 
>> ######################################################################
>> ######
>> - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING
>> JOBS 
>> ############################################################################
>>    89 0.55500 xclock     johnt        qw    12/09/2016 15:14:25     2
>> 
>> 
>> ----------------------------------------------------
>> 
>> It looks like all.q@ibm038 should have 8 free slots, so why is it only 
>> offering 0?
>> 
>> Hope you can help me.
>> Thanks
>> John
>> 
>> 
>> 
>> 
>> 
>> 
>> -----Original Message-----
>> From: Reuti [mailto:re...@staff.uni-marburg.de]
>> Sent: Monday, December 05, 2016 6:32
>> To: John_Tai
>> Cc: users@gridengine.org
>> Subject: Re: [gridengine users] CPU complex
>> 
>> Hi,
>> 
>>> Am 05.12.2016 um 09:36 schrieb John_Tai <john_...@smics.com>:
>>> 
>>> Thank you so much for your reply!
>>> 
>>>>> Will you use the consumable virtual_free here instead mem?
>>> 
>>> Yes I meant to write virtual_free, not mem. Apologies.
>>> 
>>>>> For parallel jobs you need to configure a (or some) so called PE 
>>>>> (Parallel Environment).
>>> 
>>> My jobs are actually just one process which uses multiple cores, so for 
>>> example in top one process "simv" is currently using 2 cpu cores (200%).
>> 
>> Yes, then it's a parallel job for SGE. Although the entries for 
>> start_proc_args resp. stop_proc_args can be left untouched to the default, a 
>> PE is the paradigm in SGE for a parallel job.
>> 
>> 
>>> PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>>> 3017 kelly     20   0 3353m 3.0g 165m R 200.0  0.6  15645:46 simv
>>> 
>>> So I'm not sure PE is suitable for my case, since it is not multiple 
>>> parallel processes running at the same time. Am I correct?
>>> 
>>> If so, I am trying to find a way to get SGE to keep track of the number of 
>>> cores used, but I believe it only keeps track of the total CPU usage in %. 
>>> I guess I could use this and and the <total num cores> to get the <num of 
>>> cores in use>, but how to integrate it in SGE?
>> 
>> You can specify a necessary number of cores for your job in the -pe 
>> parameter, which can also be a range. The granted allocation by SGE you can 
>> check in the job script $NHOSTS, $NSLOTS, $PE_HOSTFILE.
>> 
>> Having this setup, SGE will track the number of used cores per machine. The 
>> available ones you define in the queue definition. In case you have more 
>> than one queue per exechost, we need to setup in addition an overall limit 
>> of cores which can be used at the same time to avoid oversubscription.
>> 
>> -- Reuti
>> 
>>> Thank you again for your help.
>>> 
>>> John
>>> 
>>> -----Original Message-----
>>> From: Reuti [mailto:re...@staff.uni-marburg.de]
>>> Sent: Monday, December 05, 2016 4:21
>>> To: John_Tai
>>> Cc: users@gridengine.org
>>> Subject: Re: [gridengine users] CPU complex
>>> 
>>> Hi,
>>> 
>>> Am 05.12.2016 um 08:00 schrieb John_Tai:
>>> 
>>>> Newbie here, hope to understand SGE usage.
>>>> 
>>>> I've successfully configured virtual_free as a complex for telling SGE how 
>>>> much memory is needed when submitting a job, as described here:
>>>> 
>>>> https://docs.oracle.com/cd/E19957-01/820-0698/6ncdvjclk/index.html#i
>>>> 1000029
>>>> 
>>>> How do I do the same for telling SGE how many CPU cores a job needs? For 
>>>> example:
>>>> 
>>>>              qsub -l mem=24G,cpu=4 myjob
>>> 
>>> Will you use the consumable virtual_free here instead mem?
>>> 
>>> 
>>>> Obviously I'd need for SGE to keep track of the actual CPU utilization in 
>>>> the host, just as virtual_free is being tracked independently of the SGE 
>>>> jobs.
>>> 
>>> For parallel jobs you need to configure a (or some) so called PE (Parallel 
>>> Environment). Purpose of this is, to make preparations for the parallel 
>>> jobs like rearranging the list of granted slots, prepare shared directories 
>>> between the nodes,...
>>> 
>>> These PEs were of higher importance in former times, when parallel 
>>> libraries were not programmed to integrate automatically in SGE for a tight 
>>> integration. Your submissions could read:
>>> 
>>>  qsub -pe smp 4 myjob        # allocation_rule $peslots, control_slaves true
>>>  qsub -pe orte 16 myjob        # allovation_rule $round_robin, 
>>> control_slaves tue
>>> 
>>> where smp resp. orte is the chosen parallel environment for OpenMP resp. 
>>> Open MPI. Its settings are explained in `man sge_pe`, the "-pe" parameter 
>>> to in the submission command in `man qsub`.
>>> 
>>> -- Reuti
>>> ________________________________
>>> 
>>> This email (including its attachments, if any) may be confidential and 
>>> proprietary information of SMIC, and intended only for the use of the named 
>>> recipient(s) above. Any unauthorized use or disclosure of this email is 
>>> strictly prohibited. If you are not the intended recipient(s), please 
>>> notify the sender immediately and delete this email from your computer.
>>> 
>> 
>> ________________________________
>> 
>> This email (including its attachments, if any) may be confidential and 
>> proprietary information of SMIC, and intended only for the use of the named 
>> recipient(s) above. Any unauthorized use or disclosure of this email is 
>> strictly prohibited. If you are not the intended recipient(s), please notify 
>> the sender immediately and delete this email from your computer.
>> 
> 
> ________________________________
> 
> This email (including its attachments, if any) may be confidential and 
> proprietary information of SMIC, and intended only for the use of the named 
> recipient(s) above. Any unauthorized use or disclosure of this email is 
> strictly prohibited. If you are not the intended recipient(s), please notify 
> the sender immediately and delete this email from your computer.
> 


_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] CPU complex

Reply via email to