Am 20.12.2016 um 23:42 schrieb Christopher Black: > We have found that the behavior that multiples consumable memory resource > requests by number of pe slots can be confusing (and requires extra math in > automation scripts), so we’ve have the complex consumable value set to “JOB” > rather than “YES”. When this is done (at least on SoGE), the memory requested > is NOT multiplied by the number of slots. We also use h_vmem rather than > virtual_free.
Correct, it's not multiplied. But only the master exechost will get its memory reduced in the bookeeping. The slave exechosts might still show a too high value of the available memory I fear. -- Reuti > Best, > Chris > > On 12/20/16, 5:11 AM, "users-boun...@gridengine.org on behalf of Reuti" > <users-boun...@gridengine.org on behalf of re...@staff.uni-marburg.de> wrote: > > >> Am 20.12.2016 um 02:45 schrieb John_Tai <john_...@smics.com>: >> >> I spoke too soon. I can request PE and virtual_free separately, but I cannot >> request both: >> >> >> >> # qsub -V -b y -cwd -now n -pe cores 7 -l mem=10G -q all.q@ibm037 xclock > > Above you request "mem" (which is a snapshot of the actual usage and may > vary over the runtime of other jobs [unless they request the total amount > already at the beginning of the job and stay with it]). > >> Your job 180 ("xclock") has been submitted >> # qstat >> job-ID prior name user state submit/start at queue >> slots ja-task-ID >> ----------------------------------------------------------------------------------------------------------------- >> 180 0.55500 xclock johnt qw 12/20/2016 09:43:41 >> 7 >> # qstat -j 180 >> ============================================================== >> job_number: 180 >> exec_file: job_scripts/180 >> submission_time: Tue Dec 20 09:43:41 2016 >> owner: johnt >> uid: 162 >> group: sa >> gid: 4563 >> sge_o_home: /home/johnt >> sge_o_log_name: johnt >> sge_o_path: >> /home/sge/sge8.1.9-1.el5/bin:/home/sge/sge8.1.9-1.el5/bin/lx-amd64:/bin:/usr/bin:/usr/local/bin:/usr/X11R6/bin:/home/johnt/bin:. >> sge_o_shell: /bin/tcsh >> sge_o_workdir: /home/johnt/sge8 >> sge_o_host: ibm005 >> account: sge >> cwd: /home/johnt/sge8 >> hard resource_list: virtual_free=10G > > 10G times 7 = 70 GB > > The node has this amount of memory installed and it is defined this way in > `qconf -me ibm037`? > > -- Reuti > > >> mail_list: johnt@ibm005 >> notify: FALSE >> job_name: xclock >> jobshare: 0 >> hard_queue_list: all.q@ibm037 >> env_list: TERM=xterm,DISPLAY=dsls11:3. [..] >> script_file: xclock >> parallel environment: cores range: 7 >> binding: NONE >> job_type: binary >> scheduling info: cannot run in queue "sim.q" because it is not >> contained in its hard queue list (-q) >> cannot run in queue "pc.q" because it is not >> contained in its hard queue list (-q) >> cannot run in PE "cores" because it only offers 0 >> slots >> >> >> >> >> >> -----Original Message----- >> From: Reuti [mailto:re...@staff.uni-marburg.de] >> Sent: Saturday, December 17, 2016 10:16 >> To: Reuti >> Cc: John_Tai; users@gridengine.org; Coleman, Marcus [JRDUS Non-J&J] >> Subject: Re: [gridengine users] John's cores pe (Was: users Digest...) >> >> >> Am 17.12.2016 um 11:34 schrieb Reuti: >> >>> >>> Am 17.12.2016 um 02:01 schrieb John_Tai: >>> >>>> It is working!! Thank you to all that replied to me and helped me figure >>>> this out. >>>> >>>> I meant to set the default to 2G so that was my mistake. I changed it to: >>>> >>>> virtual_free mem MEMORY <= YES YES 2G >>>> 0 >>> >>> That's strange. A plain "2" was for me always two bytes. A "h_vmem" of 2 >>> bytes would crash the job instantly when it got scheduled, but for >>> "virtual_free" (which is only a guidance for SGE how to distribute jobs) it >>> shouldn't hinder the scheduling at all. >>> >>> `man sge_types` also lists: >>> >>> If no multiplier is present, the value is just counted in bytes. >> >> We have set "-w e" in /usr/sge/default/common/sge_request, and then I even >> face an "Unable to run job: error: no suitable queues." This happens whether >> the low 2 byte value is specified in the complex definition `qconf -mc` or >> on the command line as "-l virutal_free=2". >> >> It turns out, that the minimum value which is being accepted is: 33. >> >> -- Reuti >> >> >>> >>>> And it's working now. Although I'm not sure why it affected the PE. >>>> >>>> Also I didn't set a global one, what is the purpose of the global one? >>>> Should I set it? >>> >>> No, it was only one place I would have checked too. The global complexes >>> therein can for example be used for a limit in the number of licenses of an >>> application you have and which can be used floating in the cluster (one >>> could prefer to put such a limit in an RQS though). >>> >>> If you would have set it up there, it would have been the "overall limit of >>> memory which can be used in the complete cluster at the same time". >>> >>> -- Reuti >>> >>> >>>> # qconf -se global >>>> hostname global >>>> load_scaling NONE >>>> complex_values NONE >>>> load_values NONE >>>> processors 0 >>>> user_lists NONE >>>> xuser_lists NONE >>>> projects NONE >>>> xprojects NONE >>>> usage_scaling NONE >>>> report_variables NONE >>>> >>>> >>>> -----Original Message----- >>>> From: Reuti [mailto:re...@staff.uni-marburg.de] >>>> Sent: Friday, December 16, 2016 7:36 >>>> To: John_Tai >>>> Cc: Christopher Heiny; users@gridengine.org; Coleman, Marcus [JRDUS >>>> Non-J&J] >>>> Subject: Re: [gridengine users] John's cores pe (Was: users Digest...) >>>> >>>> >>>>> Am 16.12.2016 um 09:53 schrieb John_Tai <john_...@smics.com>: >>>>> >>>>> virtual_free mem MEMORY <= YES YES 2 >>>>> 0 >>>> >>>> This would mean, that the default consumption is 2 bytes. I already feared >>>> that a high values was programmed here. More suitable would be a default >>>> of 1G or so. >>>> >>>> Is there any virtual_free complex defined on a global level: qconf -se >>>> global >>>> >>>> -- Reuti >>>> ________________________________ >>>> >>>> This email (including its attachments, if any) may be confidential and >>>> proprietary information of SMIC, and intended only for the use of the >>>> named recipient(s) above. Any unauthorized use or disclosure of this email >>>> is strictly prohibited. If you are not the intended recipient(s), please >>>> notify the sender immediately and delete this email from your computer. >>>> >>> >>> >>> _______________________________________________ >>> users mailing list >>> users@gridengine.org >>> https://gridengine.org/mailman/listinfo/users >> >> ________________________________ >> >> This email (including its attachments, if any) may be confidential and >> proprietary information of SMIC, and intended only for the use of the named >> recipient(s) above. Any unauthorized use or disclosure of this email is >> strictly prohibited. If you are not the intended recipient(s), please notify >> the sender immediately and delete this email from your computer. >> > > > _______________________________________________ > users mailing list > users@gridengine.org > https://gridengine.org/mailman/listinfo/users > > > > This electronic message is intended for the use of the named recipient only, > and may contain information that is confidential, privileged or protected > from disclosure under applicable law. If you are not the intended recipient, > or an employee or agent responsible for delivering this message to the > intended recipient, you are hereby notified that any reading, disclosure, > dissemination, distribution, copying or use of the contents of this message > including any of its attachments is strictly prohibited. If you have received > this message in error or are not the named recipient, please notify us > immediately by contacting the sender at the electronic mail address noted > above, and destroy all copies of this message. Please note, the recipient > should check this email and any attachments for the presence of viruses. The > organization accepts no liability for any damage caused by any virus > transmitted by this email. > _______________________________________________ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users