Interestingly, I have a small test cluster that basically have the same SGE
setup does *not* have such problem. h_vmem in complex is exactly the same.
The test queue instance looks almost the same (except the CPU layout etc)

 qstat -F -q all.q@eva00
queuename                      qtype resv/used/tot. load_avg arch
 states
---------------------------------------------------------------------------------
all.q@eva00.local              BP    0/0/8          0.00     lx26-amd64
       ...
        hc:mem_requested=7.814G
        qf:qname=all.q
        qf:hostname=eva00.local
        qc:slots=8
        qf:tmpdir=/tmp
        qf:seq_no=0
        qf:rerun=0.000000
        qf:calendar=NONE
        qf:s_rt=infinity
        qf:h_rt=infinity
        qf:s_cpu=infinity
        qf:h_cpu=infinity
        qf:s_fsize=infinity
        qf:h_fsize=infinity
        qf:s_data=infinity
        qf:h_data=infinity
        qf:s_stack=infinity
        qf:h_stack=infinity
        qf:s_core=infinity
        qf:h_core=infinity
        qf:s_rss=infinity
        qf:h_rss=infinity
        qf:s_vmem=infinity
        qf:h_vmem=infinity
        qf:min_cpu_interval=00:05:00

Both clusters don't have h_vmem defined in exechost level.

Derrick


On Fri, Jul 4, 2014 at 1:58 PM, Derrick Lin <klin...@gmail.com> wrote:

> Hi all,
>
> We start using h_vmem to control jobs by their memory usage. However jobs
> couldn't start when there is -l h_vmem. The reason is
>
> (-l h_vmem=1G) cannot run in queue "intel.q@delta-5-1.local" because job
> requests unknown resource (h_vmem)
>
> However, h_vmem is definitely on the queue instance:
>
> queuename                      qtype resv/used/tot. load_avg arch
>  states
>
> ---------------------------------------------------------------------------------
> intel.q@delta-5-1.local        BIP   0/0/64         6.27     lx26-amd64
>         ....
>         hl:np_load_long=0.091563
>         hc:mem_requested=504.903G
>         qf:qname=intel.q
>         qf:hostname=delta-5-1.local
>         qc:slots=64
>         qf:tmpdir=/tmp
>         qf:seq_no=0
>         qf:rerun=0.000000
>         qf:calendar=NONE
>         qf:s_rt=infinity
>         qf:h_rt=infinity
>         qf:s_cpu=infinity
>         qf:h_cpu=infinity
>         qf:s_fsize=infinity
>         qf:h_fsize=infinity
>         qf:s_data=infinity
>         qf:h_data=infinity
>         qf:s_stack=infinity
>         qf:h_stack=infinity
>         qf:s_core=infinity
>         qf:h_core=infinity
>         qf:s_rss=infinity
>         qf:h_rss=infinity
>         qf:s_vmem=infinity
>         qf:h_vmem=infinity
>         qf:min_cpu_interval=00:05:00
>
> I tried to specify other attr such as h_rt, jobs started and finished
> successfully.
>
> qconf -sc
>  #name               shortcut   type        relop requestable consumable
> default  urgency
>
> #----------------------------------------------------------------------------------------
>  h_vmem              h_vmem     MEMORY      <=    YES         YES
> 0        0
>  #
>
> Can anyone shed light on this?
>
> Cheers,
> Derrick
>
_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Reply via email to