Am 20.12.2016 um 23:42 schrieb Christopher Black:

> We have found that the behavior that multiples consumable memory resource 
> requests by number of pe slots can be confusing (and requires extra math in 
> automation scripts), so we’ve have the complex consumable value set to “JOB” 
> rather than “YES”. When this is done (at least on SoGE), the memory requested 
> is NOT multiplied by the number of slots. We also use h_vmem rather than 
> virtual_free.

Correct, it's not multiplied. But only the master exechost will get its memory 
reduced in the bookeeping. The slave exechosts might still show a too high 
value of the available memory I fear.

-- Reuti


> Best,
> Chris
> 
> On 12/20/16, 5:11 AM, "users-boun...@gridengine.org on behalf of Reuti" 
> <users-boun...@gridengine.org on behalf of re...@staff.uni-marburg.de> wrote:
> 
> 
>> Am 20.12.2016 um 02:45 schrieb John_Tai <john_...@smics.com>:
>> 
>> I spoke too soon. I can request PE and virtual_free separately, but I cannot 
>> request both:
>> 
>> 
>> 
>> # qsub -V -b y -cwd -now n -pe cores 7 -l mem=10G -q all.q@ibm037 xclock
> 
>    Above you request "mem" (which is a snapshot of the actual usage and may 
> vary over the runtime of other jobs [unless they request the total amount 
> already at the beginning of the job and stay with it]).
> 
>> Your job 180 ("xclock") has been submitted
>> # qstat
>> job-ID  prior   name       user         state submit/start at     queue      
>>                     slots ja-task-ID
>> -----------------------------------------------------------------------------------------------------------------
>>   180 0.55500 xclock     johnt        qw    12/20/2016 09:43:41              
>>                       7
>> # qstat -j 180
>> ==============================================================
>> job_number:                 180
>> exec_file:                  job_scripts/180
>> submission_time:            Tue Dec 20 09:43:41 2016
>> owner:                      johnt
>> uid:                        162
>> group:                      sa
>> gid:                        4563
>> sge_o_home:                 /home/johnt
>> sge_o_log_name:             johnt
>> sge_o_path:                 
>> /home/sge/sge8.1.9-1.el5/bin:/home/sge/sge8.1.9-1.el5/bin/lx-amd64:/bin:/usr/bin:/usr/local/bin:/usr/X11R6/bin:/home/johnt/bin:.
>> sge_o_shell:                /bin/tcsh
>> sge_o_workdir:              /home/johnt/sge8
>> sge_o_host:                 ibm005
>> account:                    sge
>> cwd:                        /home/johnt/sge8
>> hard resource_list:         virtual_free=10G
> 
>    10G times 7 = 70 GB
> 
>    The node has this amount of memory installed and it is defined this way in 
> `qconf -me ibm037`?
> 
>    -- Reuti
> 
> 
>> mail_list:                  johnt@ibm005
>> notify:                     FALSE
>> job_name:                   xclock
>> jobshare:                   0
>> hard_queue_list:            all.q@ibm037
>> env_list:                   TERM=xterm,DISPLAY=dsls11:3. [..]
>> script_file:                xclock
>> parallel environment:  cores range: 7
>> binding:                    NONE
>> job_type:                   binary
>> scheduling info:            cannot run in queue "sim.q" because it is not 
>> contained in its hard queue list (-q)
>>                           cannot run in queue "pc.q" because it is not 
>> contained in its hard queue list (-q)
>>                           cannot run in PE "cores" because it only offers 0 
>> slots
>> 
>> 
>> 
>> 
>> 
>> -----Original Message-----
>> From: Reuti [mailto:re...@staff.uni-marburg.de]
>> Sent: Saturday, December 17, 2016 10:16
>> To: Reuti
>> Cc: John_Tai; users@gridengine.org; Coleman, Marcus [JRDUS Non-J&J]
>> Subject: Re: [gridengine users] John's cores pe (Was: users Digest...)
>> 
>> 
>> Am 17.12.2016 um 11:34 schrieb Reuti:
>> 
>>> 
>>> Am 17.12.2016 um 02:01 schrieb John_Tai:
>>> 
>>>> It is working!! Thank you to all that replied to me and helped me figure 
>>>> this out.
>>>> 
>>>> I meant to set the default to 2G so that was my mistake. I changed it to:
>>>> 
>>>> virtual_free        mem        MEMORY    <=    YES         YES        2G   
>>>>     0
>>> 
>>> That's strange. A plain "2" was for me always two bytes. A "h_vmem" of 2 
>>> bytes would crash the job instantly when it got scheduled, but for 
>>> "virtual_free" (which is only a guidance for SGE how to distribute jobs) it 
>>> shouldn't hinder the scheduling at all.
>>> 
>>> `man sge_types` also lists:
>>> 
>>>     If no multiplier is present, the value is  just  counted  in bytes.
>> 
>> We have set "-w e" in /usr/sge/default/common/sge_request, and then I even 
>> face an "Unable to run job: error: no suitable queues." This happens whether 
>> the low 2 byte value is specified in the complex definition `qconf -mc` or 
>> on the command line as "-l virutal_free=2".
>> 
>> It turns out, that the minimum value which is being accepted is: 33.
>> 
>> -- Reuti
>> 
>> 
>>> 
>>>> And it's working now. Although I'm not sure why it affected the PE.
>>>> 
>>>> Also I didn't set a global one, what is the purpose of the global one? 
>>>> Should I set it?
>>> 
>>> No, it was only one place I would have checked too. The global complexes 
>>> therein can for example be used for a limit in the number of licenses of an 
>>> application you have and which can be used floating in the cluster (one 
>>> could prefer to put such a limit in an RQS though).
>>> 
>>> If you would have set it up there, it would have been the "overall limit of 
>>> memory which can be used in the complete cluster at the same time".
>>> 
>>> -- Reuti
>>> 
>>> 
>>>> # qconf -se global
>>>> hostname              global
>>>> load_scaling          NONE
>>>> complex_values        NONE
>>>> load_values           NONE
>>>> processors            0
>>>> user_lists            NONE
>>>> xuser_lists           NONE
>>>> projects              NONE
>>>> xprojects             NONE
>>>> usage_scaling         NONE
>>>> report_variables      NONE
>>>> 
>>>> 
>>>> -----Original Message-----
>>>> From: Reuti [mailto:re...@staff.uni-marburg.de]
>>>> Sent: Friday, December 16, 2016 7:36
>>>> To: John_Tai
>>>> Cc: Christopher Heiny; users@gridengine.org; Coleman, Marcus [JRDUS 
>>>> Non-J&J]
>>>> Subject: Re: [gridengine users] John's cores pe (Was: users Digest...)
>>>> 
>>>> 
>>>>> Am 16.12.2016 um 09:53 schrieb John_Tai <john_...@smics.com>:
>>>>> 
>>>>> virtual_free        mem        MEMORY    <=    YES         YES        2   
>>>>>      0
>>>> 
>>>> This would mean, that the default consumption is 2 bytes. I already feared 
>>>> that a high values was programmed here. More suitable would be a default 
>>>> of 1G or so.
>>>> 
>>>> Is there any virtual_free complex defined on a global level: qconf -se 
>>>> global
>>>> 
>>>> -- Reuti
>>>> ________________________________
>>>> 
>>>> This email (including its attachments, if any) may be confidential and 
>>>> proprietary information of SMIC, and intended only for the use of the 
>>>> named recipient(s) above. Any unauthorized use or disclosure of this email 
>>>> is strictly prohibited. If you are not the intended recipient(s), please 
>>>> notify the sender immediately and delete this email from your computer.
>>>> 
>>> 
>>> 
>>> _______________________________________________
>>> users mailing list
>>> users@gridengine.org
>>> https://gridengine.org/mailman/listinfo/users
>> 
>> ________________________________
>> 
>> This email (including its attachments, if any) may be confidential and 
>> proprietary information of SMIC, and intended only for the use of the named 
>> recipient(s) above. Any unauthorized use or disclosure of this email is 
>> strictly prohibited. If you are not the intended recipient(s), please notify 
>> the sender immediately and delete this email from your computer.
>> 
> 
> 
>    _______________________________________________
>    users mailing list
>    users@gridengine.org
>    https://gridengine.org/mailman/listinfo/users
> 
> 
> 
> This electronic message is intended for the use of the named recipient only, 
> and may contain information that is confidential, privileged or protected 
> from disclosure under applicable law. If you are not the intended recipient, 
> or an employee or agent responsible for delivering this message to the 
> intended recipient, you are hereby notified that any reading, disclosure, 
> dissemination, distribution, copying or use of the contents of this message 
> including any of its attachments is strictly prohibited. If you have received 
> this message in error or are not the named recipient, please notify us 
> immediately by contacting the sender at the electronic mail address noted 
> above, and destroy all copies of this message. Please note, the recipient 
> should check this email and any attachments for the presence of viruses. The 
> organization accepts no liability for any damage caused by any virus 
> transmitted by this email.
> 


_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Reply via email to