Am 09.12.2016 um 10:36 schrieb John_Tai: > 8 slots: > > # qstat -f > queuename qtype resv/used/tot. load_avg arch > states > --------------------------------------------------------------------------------- > all.q@ibm021 BIP 0/0/8 0.02 lx-amd64 > --------------------------------------------------------------------------------- > all.q@ibm037 BIP 0/0/8 0.00 lx-amd64 > --------------------------------------------------------------------------------- > all.q@ibm038 BIP 0/0/8 0.00 lx-amd64 > --------------------------------------------------------------------------------- > pc.q@ibm021 BIP 0/0/1 0.02 lx-amd64 > --------------------------------------------------------------------------------- > sim.q@ibm021 BIP 0/0/1 0.02 lx-amd64
Is there any limit of slots in the exechost defined, or in an RQS? -- Reuti > > ############################################################################ > - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS > ############################################################################ > 89 0.55500 xclock johnt qw 12/09/2016 15:14:25 2 > > > > -----Original Message----- > From: Reuti [mailto:re...@staff.uni-marburg.de] > Sent: Friday, December 09, 2016 3:46 > To: John_Tai > Cc: users@gridengine.org > Subject: Re: [gridengine users] CPU complex > > Hi, > > Am 09.12.2016 um 08:20 schrieb John_Tai: > >> I've setup PE but I'm having problems submitting jobs. >> >> - Here's the PE I created: >> >> # qconf -sp cores >> pe_name cores >> slots 999 >> user_lists NONE >> xuser_lists NONE >> start_proc_args /bin/true >> stop_proc_args /bin/true >> allocation_rule $pe_slots >> control_slaves FALSE >> job_is_first_task TRUE >> urgency_slots min >> accounting_summary FALSE >> qsort_args NONE >> >> - I've then added this to all.q: >> >> qconf -aattr queue pe_list cores all.q > > How many "slots" were defined in there queue definition for all.q? > > -- Reuti > > >> - Now I submit a job: >> >> # qsub -V -b y -cwd -now n -pe cores 2 -q all.q@ibm038 xclock Your job >> 89 ("xclock") has been submitted # qstat >> job-ID prior name user state submit/start at queue >> slots ja-task-ID >> ----------------------------------------------------------------------------------------------------------------- >> 89 0.00000 xclock johnt qw 12/09/2016 15:14:25 >> 2 >> # qalter -w p 89 >> Job 89 cannot run in PE "cores" because it only offers 0 slots >> verification: no suitable queues >> # qstat -f >> queuename qtype resv/used/tot. load_avg arch >> states >> --------------------------------------------------------------------------------- >> all.q@ibm038 BIP 0/0/8 0.00 lx-amd64 >> >> ###################################################################### >> ###### >> - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING >> JOBS >> ############################################################################ >> 89 0.55500 xclock johnt qw 12/09/2016 15:14:25 2 >> >> >> ---------------------------------------------------- >> >> It looks like all.q@ibm038 should have 8 free slots, so why is it only >> offering 0? >> >> Hope you can help me. >> Thanks >> John >> >> >> >> >> >> >> -----Original Message----- >> From: Reuti [mailto:re...@staff.uni-marburg.de] >> Sent: Monday, December 05, 2016 6:32 >> To: John_Tai >> Cc: users@gridengine.org >> Subject: Re: [gridengine users] CPU complex >> >> Hi, >> >>> Am 05.12.2016 um 09:36 schrieb John_Tai <john_...@smics.com>: >>> >>> Thank you so much for your reply! >>> >>>>> Will you use the consumable virtual_free here instead mem? >>> >>> Yes I meant to write virtual_free, not mem. Apologies. >>> >>>>> For parallel jobs you need to configure a (or some) so called PE >>>>> (Parallel Environment). >>> >>> My jobs are actually just one process which uses multiple cores, so for >>> example in top one process "simv" is currently using 2 cpu cores (200%). >> >> Yes, then it's a parallel job for SGE. Although the entries for >> start_proc_args resp. stop_proc_args can be left untouched to the default, a >> PE is the paradigm in SGE for a parallel job. >> >> >>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND >>> 3017 kelly 20 0 3353m 3.0g 165m R 200.0 0.6 15645:46 simv >>> >>> So I'm not sure PE is suitable for my case, since it is not multiple >>> parallel processes running at the same time. Am I correct? >>> >>> If so, I am trying to find a way to get SGE to keep track of the number of >>> cores used, but I believe it only keeps track of the total CPU usage in %. >>> I guess I could use this and and the <total num cores> to get the <num of >>> cores in use>, but how to integrate it in SGE? >> >> You can specify a necessary number of cores for your job in the -pe >> parameter, which can also be a range. The granted allocation by SGE you can >> check in the job script $NHOSTS, $NSLOTS, $PE_HOSTFILE. >> >> Having this setup, SGE will track the number of used cores per machine. The >> available ones you define in the queue definition. In case you have more >> than one queue per exechost, we need to setup in addition an overall limit >> of cores which can be used at the same time to avoid oversubscription. >> >> -- Reuti >> >>> Thank you again for your help. >>> >>> John >>> >>> -----Original Message----- >>> From: Reuti [mailto:re...@staff.uni-marburg.de] >>> Sent: Monday, December 05, 2016 4:21 >>> To: John_Tai >>> Cc: users@gridengine.org >>> Subject: Re: [gridengine users] CPU complex >>> >>> Hi, >>> >>> Am 05.12.2016 um 08:00 schrieb John_Tai: >>> >>>> Newbie here, hope to understand SGE usage. >>>> >>>> I've successfully configured virtual_free as a complex for telling SGE how >>>> much memory is needed when submitting a job, as described here: >>>> >>>> https://docs.oracle.com/cd/E19957-01/820-0698/6ncdvjclk/index.html#i >>>> 1000029 >>>> >>>> How do I do the same for telling SGE how many CPU cores a job needs? For >>>> example: >>>> >>>> qsub -l mem=24G,cpu=4 myjob >>> >>> Will you use the consumable virtual_free here instead mem? >>> >>> >>>> Obviously I'd need for SGE to keep track of the actual CPU utilization in >>>> the host, just as virtual_free is being tracked independently of the SGE >>>> jobs. >>> >>> For parallel jobs you need to configure a (or some) so called PE (Parallel >>> Environment). Purpose of this is, to make preparations for the parallel >>> jobs like rearranging the list of granted slots, prepare shared directories >>> between the nodes,... >>> >>> These PEs were of higher importance in former times, when parallel >>> libraries were not programmed to integrate automatically in SGE for a tight >>> integration. Your submissions could read: >>> >>> qsub -pe smp 4 myjob # allocation_rule $peslots, control_slaves true >>> qsub -pe orte 16 myjob # allovation_rule $round_robin, >>> control_slaves tue >>> >>> where smp resp. orte is the chosen parallel environment for OpenMP resp. >>> Open MPI. Its settings are explained in `man sge_pe`, the "-pe" parameter >>> to in the submission command in `man qsub`. >>> >>> -- Reuti >>> ________________________________ >>> >>> This email (including its attachments, if any) may be confidential and >>> proprietary information of SMIC, and intended only for the use of the named >>> recipient(s) above. Any unauthorized use or disclosure of this email is >>> strictly prohibited. If you are not the intended recipient(s), please >>> notify the sender immediately and delete this email from your computer. >>> >> >> ________________________________ >> >> This email (including its attachments, if any) may be confidential and >> proprietary information of SMIC, and intended only for the use of the named >> recipient(s) above. Any unauthorized use or disclosure of this email is >> strictly prohibited. If you are not the intended recipient(s), please notify >> the sender immediately and delete this email from your computer. >> > > ________________________________ > > This email (including its attachments, if any) may be confidential and > proprietary information of SMIC, and intended only for the use of the named > recipient(s) above. Any unauthorized use or disclosure of this email is > strictly prohibited. If you are not the intended recipient(s), please notify > the sender immediately and delete this email from your computer. > _______________________________________________ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users