Hi Reuti, That's interesting, but it works without any hack:
{ name default_per_user enabled true description "Each user entitles to resources equivalent to three nodes" limit users {*} queues {all.q} to slots=192,h_vmem=1536G } Then it consumes from user's quota: $ qquota -u "*" resource quota rule limit filter -------------------------------------------------------------------------------- default_per_user/1 slots=166/192 users b****** queues all.q default_per_user/1 h_vmem=400.000G/1536 users b****** queues all.q Is it illegal to set h_vmem in per user quota in the first place? Cheers, D On Wed, Jul 30, 2014 at 4:37 PM, Reuti <re...@staff.uni-marburg.de> wrote: > Hi, > > Am 30.07.2014 um 03:29 schrieb Derrick Lin: > > > **No** initial value per queue instance, I force the users to specify > both h_vmem and mem_requested by defining default values inside sge_default > file. > > > > No h_vmem on exechost level either, because we want to use mem_requested > instead since it's already been setup across all exechosts. > > > > My original issue was, when I set params MONITOR=1 jobs failed to start. > > > > Now I have MONITOR=1 removed, all jobs start and run fine. Any idea? > > They still shouldn't start. As you defined "h_vmem" as being consumable, > it's a question: consume from what? > > Nevertheless you can set an arbitrary high value in the global exechost > `qconf -me global` there under "complex_values". > > -- Reuti > > > > D > > > > > > On Tue, Jul 29, 2014 at 7:43 PM, Reuti <re...@staff.uni-marburg.de> > wrote: > > Hi, > > > > Am 29.07.2014 um 06:07 schrieb Derrick Lin: > > > > > This is qhost of one of our compute nodes: > > > > > > pwbcad@gamma01:~$ qhost -F -h omega-0-9 > > > HOSTNAME ARCH NCPU LOAD MEMTOT MEMUSE > SWAPTO SWAPUS > > > > ------------------------------------------------------------------------------- > > > global - - - - - > - - > > > omega-0-9 lx26-amd64 64 12.34 504.9G 273.6G > 256.0G 14.6G > > > hl:arch=lx26-amd64 > > > hl:num_proc=64.000000 > > > hl:mem_total=504.890G > > > hl:swap_total=256.000G > > > hl:virtual_total=760.890G > > > hl:load_avg=12.340000 > > > hl:load_short=9.720000 > > > hl:load_medium=12.340000 > > > hl:load_long=18.900000 > > > hl:mem_free=231.308G > > > hl:swap_free=241.356G > > > hl:virtual_free=472.663G > > > hl:mem_used=273.582G > > > hl:swap_used=14.644G > > > hl:virtual_used=288.226G > > > hl:cpu=15.400000 > > > > > hl:m_topology=SCTTCTTCTTCTTCTTCTTCTTCTTSCTTCTTCTTCTTCTTCTTCTTCTTSCTTCTTCTTCTTCTTCTTCTTCTTSCTTCTTCTTCTTCTTCTTCTTCTT > > > > > hl:m_topology_inuse=SCTTCTTCTTCTTCTTCTTCTTCTTSCTTCTTCTTCTTCTTCTTCTTCTTSCTTCTTCTTCTTCTTCTTCTTCTTSCTTCTTCTTCTTCTTCTTCTTCTT > > > hl:m_socket=4.000000 > > > hl:m_core=32.000000 > > > hl:np_load_avg=0.192812 > > > hl:np_load_short=0.151875 > > > hl:np_load_medium=0.192812 > > > hl:np_load_long=0.295312 > > > hc:mem_requested=502.890G > > > > So, here is no h_vmem on an exechost level. > > > > > > > We do not set h_vmem in queue instance level, that's intended because > we just need h_vmem in per user quota like: > > > > Typo and you mean exechost level? > > > > > > > { > > > name default_per_user > > > enabled true > > > description "Each user entitles to resources equivalent to > two nodes" > > > limit users {*} queues {all.q} to slots=16,h_vmem=16G > > > } > > > > RQS limits are not enforced. The user has to specify it by hand then > with the -l option to `qsub`. > > > > Is "h_vmem" then in "complex_values" in the queue definition with an > initial value per queue instance? > > > > -- Reuti > > > > > > > At the queue instance level, we use mem_requested as "per host quota" > instead. It's a custom complex attr we setup for our specific applications. > > > > > > Cheers, > > > D > > > > > > > > > On Tue, Jul 29, 2014 at 1:02 AM, Reuti <re...@staff.uni-marburg.de> > wrote: > > > Hi, > > > > > > Am 04.07.2014 um 06:04 schrieb Derrick Lin: > > > > > > > Interestingly, I have a small test cluster that basically have the > same SGE setup does *not* have such problem. h_vmem in complex is exactly > the same. The test queue instance looks almost the same (except the CPU > layout etc) > > > > > > > > qstat -F -q all.q@eva00 > > > > queuename qtype resv/used/tot. load_avg arch > states > > > > > --------------------------------------------------------------------------------- > > > > all.q@eva00.local BP 0/0/8 0.00 > lx26-amd64 > > > > ... > > > > hc:mem_requested=7.814G > > > > qf:qname=all.q > > > > qf:hostname=eva00.local > > > > qc:slots=8 > > > > qf:tmpdir=/tmp > > > > qf:seq_no=0 > > > > qf:rerun=0.000000 > > > > qf:calendar=NONE > > > > qf:s_rt=infinity > > > > qf:h_rt=infinity > > > > qf:s_cpu=infinity > > > > qf:h_cpu=infinity > > > > qf:s_fsize=infinity > > > > qf:h_fsize=infinity > > > > qf:s_data=infinity > > > > qf:h_data=infinity > > > > qf:s_stack=infinity > > > > qf:h_stack=infinity > > > > qf:s_core=infinity > > > > qf:h_core=infinity > > > > qf:s_rss=infinity > > > > qf:h_rss=infinity > > > > qf:s_vmem=infinity > > > > qf:h_vmem=infinity > > > > qf:min_cpu_interval=00:05:00 > > > > > > > > Both clusters don't have h_vmem defined in exechost level. > > > > > > What is the output of: > > > > > > `qhost -F` > > > > > > Below you write that it's also defined on a queue instance level, > hence in both places (as "complex_values")? > > > > > > -- Reuti > > > > > > > > > > Derrick > > > > > > > > > > > > On Fri, Jul 4, 2014 at 1:58 PM, Derrick Lin <klin...@gmail.com> > wrote: > > > > Hi all, > > > > > > > > We start using h_vmem to control jobs by their memory usage. However > jobs couldn't start when there is -l h_vmem. The reason is > > > > > > > > (-l h_vmem=1G) cannot run in queue "intel.q@delta-5-1.local" > because job requests unknown resource (h_vmem) > > > > > > > > However, h_vmem is definitely on the queue instance: > > > > > > > > queuename qtype resv/used/tot. load_avg arch > states > > > > > --------------------------------------------------------------------------------- > > > > intel.q@delta-5-1.local BIP 0/0/64 6.27 > lx26-amd64 > > > > .... > > > > hl:np_load_long=0.091563 > > > > hc:mem_requested=504.903G > > > > qf:qname=intel.q > > > > qf:hostname=delta-5-1.local > > > > qc:slots=64 > > > > qf:tmpdir=/tmp > > > > qf:seq_no=0 > > > > qf:rerun=0.000000 > > > > qf:calendar=NONE > > > > qf:s_rt=infinity > > > > qf:h_rt=infinity > > > > qf:s_cpu=infinity > > > > qf:h_cpu=infinity > > > > qf:s_fsize=infinity > > > > qf:h_fsize=infinity > > > > qf:s_data=infinity > > > > qf:h_data=infinity > > > > qf:s_stack=infinity > > > > qf:h_stack=infinity > > > > qf:s_core=infinity > > > > qf:h_core=infinity > > > > qf:s_rss=infinity > > > > qf:h_rss=infinity > > > > qf:s_vmem=infinity > > > > qf:h_vmem=infinity > > > > qf:min_cpu_interval=00:05:00 > > > > > > > > I tried to specify other attr such as h_rt, jobs started and > finished successfully. > > > > > > > > > > > > > > > > > > > > qconf -sc > > > > > > > > > > > > > > > > #name shortcut type relop requestable > consumable default urgency > > > > > > > > > > > > > > > > > #---------------------------------------------------------------------------------------- > > > > > > > > > > > > > > > > h_vmem h_vmem MEMORY <= YES YES > 0 0 > > > > > > > > > > > > > > > > # > > > > > > > > Can anyone shed light on this? > > > > > > > > Cheers, > > > > Derrick > > > > > > > > _______________________________________________ > > > > users mailing list > > > > users@gridengine.org > > > > https://gridengine.org/mailman/listinfo/users > > > > > > > > > > > >
_______________________________________________ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users