# qconf -sq all.q qname all.q hostlist @allhosts seq_no 0 load_thresholds np_load_avg=1.75 suspend_thresholds NONE nsuspend 1 suspend_interval 00:05:00 priority 0 min_cpu_interval 00:05:00 processors UNDEFINED qtype BATCH INTERACTIVE ckpt_list NONE pe_list make mpi smp cores rerun FALSE slots 1,[ibm021=8],[ibm037=8],[ibm038=8] tmpdir /tmp shell /bin/sh prolog NONE epilog NONE shell_start_mode posix_compliant starter_method NONE suspend_method NONE resume_method NONE terminate_method NONE notify 00:00:60 owner_list NONE user_lists NONE xuser_lists NONE subordinate_list NONE complex_values NONE projects NONE xprojects NONE calendar NONE initial_state default s_rt INFINITY h_rt INFINITY s_cpu INFINITY h_cpu INFINITY s_fsize INFINITY h_fsize INFINITY s_data INFINITY h_data INFINITY s_stack INFINITY h_stack INFINITY s_core INFINITY h_core INFINITY s_rss INFINITY h_rss INFINITY s_vmem INFINITY h_vmem INFINITY
From: Christopher Heiny [mailto:christopherhe...@gmail.com] Sent: Monday, December 12, 2016 12:22 To: John_Tai Cc: users@gridengine.org; Reuti Subject: Re: [gridengine users] CPU complex On Dec 11, 2016 5:11 PM, "John_Tai" <john_...@smics.com<mailto:john_...@smics.com>> wrote: I associated the queue with the PE: qconf -aattr queue pe_list cores all.q The only slots were defined in the all.q queue, and just the total slots in the PE: >> # qconf -sp cores >> pe_name cores >> slots 999 >> user_lists NONE >> xuser_lists NONE Do I need to define slots in another way for each exec host? Is there a way to check the current free slots for a host, other than the qstat -f below? > # qstat -f > queuename qtype resv/used/tot. load_avg arch > states > --------------------------------------------------------------------------------- > all.q@ibm021<mailto:all.q@ibm021> BIP 0/0/8 0.02 > lx-amd64 > --------------------------------------------------------------------------------- > all.q@ibm037<mailto:all.q@ibm037> BIP 0/0/8 0.00 > lx-amd64 > --------------------------------------------------------------------------------- > all.q@ibm038<mailto:all.q@ibm038> BIP 0/0/8 0.00 > lx-amd64 What is the output of the command qconf -sq all.q ? (I think that's right one) Chris -----Original Message----- From: Reuti [mailto:re...@staff.uni-marburg.de<mailto:re...@staff.uni-marburg.de>] Sent: Saturday, December 10, 2016 5:40 To: John_Tai Cc: users@gridengine.org<mailto:users@gridengine.org> Subject: Re: [gridengine users] CPU complex Am 09.12.2016 um 10:36 schrieb John_Tai: > 8 slots: > > # qstat -f > queuename qtype resv/used/tot. load_avg arch > states > --------------------------------------------------------------------------------- > all.q@ibm021<mailto:all.q@ibm021> BIP 0/0/8 0.02 > lx-amd64 > --------------------------------------------------------------------------------- > all.q@ibm037<mailto:all.q@ibm037> BIP 0/0/8 0.00 > lx-amd64 > --------------------------------------------------------------------------------- > all.q@ibm038<mailto:all.q@ibm038> BIP 0/0/8 0.00 > lx-amd64 > --------------------------------------------------------------------------------- > pc.q@ibm021<mailto:pc.q@ibm021> BIP 0/0/1 0.02 > lx-amd64 > --------------------------------------------------------------------------------- > sim.q@ibm021<mailto:sim.q@ibm021> BIP 0/0/1 0.02 > lx-amd64 Is there any limit of slots in the exechost defined, or in an RQS? -- Reuti > > ###################################################################### > ###### > - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING > JOBS > ############################################################################ > 89 0.55500 xclock johnt qw 12/09/2016 15:14:25 2 > > > > -----Original Message----- > From: Reuti > [mailto:re...@staff.uni-marburg.de<mailto:re...@staff.uni-marburg.de>] > Sent: Friday, December 09, 2016 3:46 > To: John_Tai > Cc: users@gridengine.org<mailto:users@gridengine.org> > Subject: Re: [gridengine users] CPU complex > > Hi, > > Am 09.12.2016 um 08:20 schrieb John_Tai: > >> I've setup PE but I'm having problems submitting jobs. >> >> - Here's the PE I created: >> >> # qconf -sp cores >> pe_name cores >> slots 999 >> user_lists NONE >> xuser_lists NONE >> start_proc_args /bin/true >> stop_proc_args /bin/true >> allocation_rule $pe_slots >> control_slaves FALSE >> job_is_first_task TRUE >> urgency_slots min >> accounting_summary FALSE >> qsort_args NONE >> >> - I've then added this to all.q: >> >> qconf -aattr queue pe_list cores all.q > > How many "slots" were defined in there queue definition for all.q? > > -- Reuti > > >> - Now I submit a job: >> >> # qsub -V -b y -cwd -now n -pe cores 2 -q all.q@ibm038<mailto:all.q@ibm038> >> xclock Your >> job >> 89 ("xclock") has been submitted # qstat >> job-ID prior name user state submit/start at queue >> slots ja-task-ID >> ----------------------------------------------------------------------------------------------------------------- >> 89 0.00000 xclock johnt qw 12/09/2016 15:14:25 >> 2 >> # qalter -w p 89 >> Job 89 cannot run in PE "cores" because it only offers 0 slots >> verification: no suitable queues >> # qstat -f >> queuename qtype resv/used/tot. load_avg arch >> states >> --------------------------------------------------------------------------------- >> all.q@ibm038<mailto:all.q@ibm038> BIP 0/0/8 >> 0.00 lx-amd64 >> >> ##################################################################### >> # >> ###### >> - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING >> JOBS >> ############################################################################ >> 89 0.55500 xclock johnt qw 12/09/2016 15:14:25 2 >> >> >> ---------------------------------------------------- >> >> It looks like all.q@ibm038<mailto:all.q@ibm038> should have 8 free slots, so >> why is it only offering 0? >> >> Hope you can help me. >> Thanks >> John >> >> >> >> >> >> >> -----Original Message----- >> From: Reuti >> [mailto:re...@staff.uni-marburg.de<mailto:re...@staff.uni-marburg.de>] >> Sent: Monday, December 05, 2016 6:32 >> To: John_Tai >> Cc: users@gridengine.org<mailto:users@gridengine.org> >> Subject: Re: [gridengine users] CPU complex >> >> Hi, >> >>> Am 05.12.2016 um 09:36 schrieb John_Tai >>> <john_...@smics.com<mailto:john_...@smics.com>>: >>> >>> Thank you so much for your reply! >>> >>>>> Will you use the consumable virtual_free here instead mem? >>> >>> Yes I meant to write virtual_free, not mem. Apologies. >>> >>>>> For parallel jobs you need to configure a (or some) so called PE >>>>> (Parallel Environment). >>> >>> My jobs are actually just one process which uses multiple cores, so for >>> example in top one process "simv" is currently using 2 cpu cores (200%). >> >> Yes, then it's a parallel job for SGE. Although the entries for >> start_proc_args resp. stop_proc_args can be left untouched to the default, a >> PE is the paradigm in SGE for a parallel job. >> >> >>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND >>> 3017 kelly 20 0 3353m 3.0g 165m R 200.0 0.6 15645:46 simv >>> >>> So I'm not sure PE is suitable for my case, since it is not multiple >>> parallel processes running at the same time. Am I correct? >>> >>> If so, I am trying to find a way to get SGE to keep track of the number of >>> cores used, but I believe it only keeps track of the total CPU usage in %. >>> I guess I could use this and and the <total num cores> to get the <num of >>> cores in use>, but how to integrate it in SGE? >> >> You can specify a necessary number of cores for your job in the -pe >> parameter, which can also be a range. The granted allocation by SGE you can >> check in the job script $NHOSTS, $NSLOTS, $PE_HOSTFILE. >> >> Having this setup, SGE will track the number of used cores per machine. The >> available ones you define in the queue definition. In case you have more >> than one queue per exechost, we need to setup in addition an overall limit >> of cores which can be used at the same time to avoid oversubscription. >> >> -- Reuti >> >>> Thank you again for your help. >>> >>> John >>> >>> -----Original Message----- >>> From: Reuti >>> [mailto:re...@staff.uni-marburg.de<mailto:re...@staff.uni-marburg.de>] >>> Sent: Monday, December 05, 2016 4:21 >>> To: John_Tai >>> Cc: users@gridengine.org<mailto:users@gridengine.org> >>> Subject: Re: [gridengine users] CPU complex >>> >>> Hi, >>> >>> Am 05.12.2016 um 08:00 schrieb John_Tai: >>> >>>> Newbie here, hope to understand SGE usage. >>>> >>>> I've successfully configured virtual_free as a complex for telling SGE how >>>> much memory is needed when submitting a job, as described here: >>>> >>>> https://docs.oracle.com/cd/E19957-01/820-0698/6ncdvjclk/index.html#<https://docs.oracle.com/cd/E19957-01/820-0698/6ncdvjclk/index.html> >>>> i >>>> 1000029 >>>> >>>> How do I do the same for telling SGE how many CPU cores a job needs? For >>>> example: >>>> >>>> qsub -l mem=24G,cpu=4 myjob >>> >>> Will you use the consumable virtual_free here instead mem? >>> >>> >>>> Obviously I'd need for SGE to keep track of the actual CPU utilization in >>>> the host, just as virtual_free is being tracked independently of the SGE >>>> jobs. >>> >>> For parallel jobs you need to configure a (or some) so called PE (Parallel >>> Environment). Purpose of this is, to make preparations for the parallel >>> jobs like rearranging the list of granted slots, prepare shared directories >>> between the nodes,... >>> >>> These PEs were of higher importance in former times, when parallel >>> libraries were not programmed to integrate automatically in SGE for a tight >>> integration. Your submissions could read: >>> >>> qsub -pe smp 4 myjob # allocation_rule $peslots, control_slaves true >>> qsub -pe orte 16 myjob # allovation_rule $round_robin, >>> control_slaves tue >>> >>> where smp resp. orte is the chosen parallel environment for OpenMP resp. >>> Open MPI. Its settings are explained in `man sge_pe`, the "-pe" parameter >>> to in the submission command in `man qsub`. >>> >>> -- Reuti >>> ________________________________ >>> >>> This email (including its attachments, if any) may be confidential and >>> proprietary information of SMIC, and intended only for the use of the named >>> recipient(s) above. Any unauthorized use or disclosure of this email is >>> strictly prohibited. If you are not the intended recipient(s), please >>> notify the sender immediately and delete this email from your computer. >>> >> >> ________________________________ >> >> This email (including its attachments, if any) may be confidential and >> proprietary information of SMIC, and intended only for the use of the named >> recipient(s) above. Any unauthorized use or disclosure of this email is >> strictly prohibited. If you are not the intended recipient(s), please notify >> the sender immediately and delete this email from your computer. >> > > ________________________________ > > This email (including its attachments, if any) may be confidential and > proprietary information of SMIC, and intended only for the use of the named > recipient(s) above. Any unauthorized use or disclosure of this email is > strictly prohibited. If you are not the intended recipient(s), please notify > the sender immediately and delete this email from your computer. > ________________________________ This email (including its attachments, if any) may be confidential and proprietary information of SMIC, and intended only for the use of the named recipient(s) above. Any unauthorized use or disclosure of this email is strictly prohibited. If you are not the intended recipient(s), please notify the sender immediately and delete this email from your computer. _______________________________________________ users mailing list users@gridengine.org<mailto:users@gridengine.org> https://gridengine.org/mailman/listinfo/users ________________________________ This email (including its attachments, if any) may be confidential and proprietary information of SMIC, and intended only for the use of the named recipient(s) above. Any unauthorized use or disclosure of this email is strictly prohibited. If you are not the intended recipient(s), please notify the sender immediately and delete this email from your computer.
_______________________________________________ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users