Is mem_free defined in the host complex_values? What does qconf -sc | grep mem_free
show? Is there a default value defined? Ian On Fri, Jan 23, 2015 at 11:30 AM, Ilya M <4ilya.m+g...@gmail.com> wrote: > Because I am testing with qsub -w v, the jobs is not accepted for > scheduling, job id is not generated, and qstat -j will not work. The output > of qsub is as I showed in the original email: > > Job 2210897 (mem_free=100G) cannot run in queue "gpu.q@gpu001" because job > requests unknown resource (mem_free) > Job 2210897 (mem_free=100G) cannot run in queue "gpu.q@gpu002" because job > requests unknown resource (mem_free) > Job 2210897 (mem_free=100G) cannot run in queue "gpu.q@gpu003" because job > requests unknown resource (mem_free) > Job 2210897 (mem_free=100G) cannot run in queue "gpu.q@gpu004" because job > requests unknown resource (mem_free) > Job 2210897 (mem_free=100G) cannot run in queue "gpu.q@gpu005" because job > requests unknown resource (mem_free) > Job 2210897 (mem_free=100G) cannot run in queue "gpu.q@gpu006" because job > requests unknown resource (mem_free) > ... > > Ilya. > > > -------- Original Message -------- > Subject: Re: [gridengine users] Cannot request resource if it is a load > value of memory type: SGE reports it as unknown resource > From: Feng Zhang <prod.f...@gmail.com> > To: Ilya M <4ilya.m+g...@gmail.com> > Date: 1/23/15, 9:27 AM >> >> Llya, >> >> Can you please run: >> >> qstat -j <jobid> >> >> and past the output here? It may be useful for checking the problem >> >> On Fri, Jan 23, 2015 at 12:08 PM, Ilya M <4ilya.m+g...@gmail.com> wrote: >>> >>> Removed the quota limits. To no avail: same problems. >>> >>> >>> -------- Original Message -------- >>> Subject: Re: [gridengine users] Cannot request resource if it is a load >>> value of memory type: SGE reports it as unknown resource >>> From: Reuti <re...@staff.uni-marburg.de> >>> To: Ilya M <4ilya.m+g...@gmail.com> >>> Date: 1/23/15, 2:33 AM >>>> >>>> Can you remove them temporarily? I saw cases where suddenly the "unknown >>>> resource" popped up - and also suddenly vanished again, but it was >>>> somehow >>>> connected to RQS was my conclusion. >>>> >>>> -- Reuti >>>> >>>> >>>>> Am 23.01.2015 um 00:16 schrieb Ilya M <4ilya.m+g...@gmail.com>: >>>>> >>>>> There are two RQS, one is disabled: >>>>> >>>>> { >>>>> name limit_for_interns >>>>> description "limit to max 5 GPU jobs per intern." >>>>> enabled TRUE >>>>> limit users {int1,int2} hosts @gpu to slots=5 >>>>> } >>>>> { >>>>> name limit_slots >>>>> description NONE >>>>> enabled FALSE >>>>> limit hosts {@gpu} to slots=2 >>>>> } >>>>> >>>>> >>>>> -------- Original Message -------- >>>>> Subject: Re: [gridengine users] Cannot request resource if it is a load >>>>> value of memory type: SGE reports it as unknown resource >>>>> From: Reuti <re...@staff.uni-marburg.de> >>>>> To: Ilya <4ilya.m+g...@gmail.com> >>>>> Date: 1/21/15, 16:12 >>>>>> >>>>>> Hi, >>>>>> >>>>>> Am 22.01.2015 um 00:52 schrieb Ilya: >>>>>> >>>>>>> Something happened to the SGE (6.2u5) that had been running fine for >>>>>>> many months, and users can no longer put resource requests for load >>>>>>> values >>>>>>> if they are of memory type, e.g. >>>>>>> >>>>>>> qsub -l mem_free=5G -w v .... produces the following output: >>>>>>> >>>>>>> cannot run in queue "gpu.q@gpu038" because job requests unknown >>>>>>> resource (mem_free) >>>>>>> >>>>>>> The resource is available, though, when querying for it: >>>>>>> qhost -F mem_free -h gpu038 >>>>>>> HOSTNAME ARCH NCPU LOAD MEMTOT MEMUSE >>>>>>> SWAPTO >>>>>>> SWAPUS >>>>>>> >>>>>>> >>>>>>> ------------------------------------------------------------------------------- >>>>>>> global - - - - - - >>>>>>> - >>>>>>> gpu038 lx24-amd64 16 2.11 126.1G 15.7G >>>>>>> 4.0G 0.0 >>>>>>> Host Resource(s): hl:mem_free=110.416G >>>>>>> >>>>>>> >>>>>>> This was first reported by a user when he tried to request custom >>>>>>> "hl" >>>>>>> resource. However, it now appears that all "hl" resources of type >>>>>>> "memory" >>>>>>> show this behavior. Integer "hl" are OK. >>>>>> >>>>>> Do you have any RQS in place? >>>>>> >>>>>> -- Reuti >>>>>> >>>>>> >>>>>>> I bounced qmaster between master and shadow-master a couple of times, >>>>>>> but it did not resolve the problem. >>>>>>> >>>>>>> Additionally, when I added MONITOR=1 to scheduler's configuration, >>>>>>> the >>>>>>> file $SGE_ROOT/$SGE_CELL/common/schedule contains only colons: >>>>>>> :::::::: >>>>>>> :::::::: >>>>>>> :::::::: >>>>>>> >>>>>>> Any ideas? >>>>>>> >>>>>>> _______________________________________________ >>>>>>> users mailing list >>>>>>> users@gridengine.org >>>>>>> https://gridengine.org/mailman/listinfo/users >>>>> >>>>> _______________________________________________ >>>>> users mailing list >>>>> users@gridengine.org >>>>> https://gridengine.org/mailman/listinfo/users >>> >>> >>> _______________________________________________ >>> users mailing list >>> users@gridengine.org >>> https://gridengine.org/mailman/listinfo/users >> >> >> > > _______________________________________________ > users mailing list > users@gridengine.org > https://gridengine.org/mailman/listinfo/users -- Ian Kaufman Research Systems Administrator UC San Diego, Jacobs School of Engineering ikaufman AT ucsd DOT edu _______________________________________________ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users