# qconf -sc #name shortcut type relop requestable consumable default urgency #-------------------------------------------------------------------------------------- arch a STRING == YES NO NONE 0 calendar c STRING == YES NO NONE 0 cpu cpu DOUBLE >= YES NO 0 0 display_win_gui dwg BOOL == YES NO 0 0 h_core h_core MEMORY <= YES NO 0 0 h_cpu h_cpu TIME <= YES NO 0:0:0 0 h_data h_data MEMORY <= YES NO 0 0 h_fsize h_fsize MEMORY <= YES NO 0 0 h_rss h_rss MEMORY <= YES NO 0 0 h_rt h_rt TIME <= YES NO 0:0:0 0 h_stack h_stack MEMORY <= YES NO 0 0 h_vmem h_vmem MEMORY <= YES NO 0 0 hostname h HOST == YES NO NONE 0 load_avg la DOUBLE >= NO NO 0 0 load_long ll DOUBLE >= NO NO 0 0 load_medium lm DOUBLE >= NO NO 0 0 load_short ls DOUBLE >= NO NO 0 0 m_core core INT <= YES NO 0 0 m_socket socket INT <= YES NO 0 0 m_thread thread INT <= YES NO 0 0 m_topology topo STRING == YES NO NONE 0 m_topology_inuse utopo STRING == YES NO NONE 0 mem_free mf MEMORY <= YES NO 0 0 mem_total mt MEMORY <= YES NO 0 0 mem_used mu MEMORY >= YES NO 0 0 min_cpu_interval mci TIME <= NO NO 0:0:0 0 np_load_avg nla DOUBLE >= NO NO 0 0 np_load_long nll DOUBLE >= NO NO 0 0 np_load_medium nlm DOUBLE >= NO NO 0 0 np_load_short nls DOUBLE >= NO NO 0 0 num_proc p INT == YES NO 0 0 qname q STRING == YES NO NONE 0 rerun re BOOL == NO NO 0 0 s_core s_core MEMORY <= YES NO 0 0 s_cpu s_cpu TIME <= YES NO 0:0:0 0 s_data s_data MEMORY <= YES NO 0 0 s_fsize s_fsize MEMORY <= YES NO 0 0 s_rss s_rss MEMORY <= YES NO 0 0 s_rt s_rt TIME <= YES NO 0:0:0 0 s_stack s_stack MEMORY <= YES NO 0 0 s_vmem s_vmem MEMORY <= YES NO 0 0 seq_no seq INT == NO NO 0 0 slots s INT <= YES YES 1 1000 swap_free sf MEMORY <= YES NO 0 0 swap_rate sr MEMORY >= YES NO 0 0 swap_rsvd srsv MEMORY >= YES NO 0 0 swap_total st MEMORY <= YES NO 0 0 swap_used su MEMORY >= YES NO 0 0 tmpdir tmp STRING == NO NO NONE 0 virtual_free mem MEMORY <= YES YES 2 0 virtual_total vt MEMORY <= YES NO 0 0 virtual_used vu MEMORY >= YES NO 0 0
Best Regards John Tai Design Services Semiconductor Manufacturing International (Shanghai) Corp. Tel: 21-3861-0000 ext. 16116 E-Fax: 21-5080-4000 ext. 02906 -----Original Message----- From: Reuti [mailto:re...@staff.uni-marburg.de] Sent: Friday, December 16, 2016 4:22 To: John_Tai Cc: Christopher Heiny; users@gridengine.org; Coleman, Marcus [JRDUS Non-J&J] Subject: Re: [gridengine users] John's cores pe (Was: users Digest...) Am 16.12.2016 um 03:54 schrieb John_Tai: > I have pinpointed the problem, but I don't know how to solve it. > > It looks like hosts with complex virtual_free cannot run jobs that require > PE, even though the virtual_free complex was not requested by the job. I set > the virtual_free to allow jobs to request RAM, so the goal is the each job to > request both RAM and number of cpu cores. Hopefully this helps figuring out a > solution. Thanks. How does the definition of the complex look like in `qconf -sc`? -- Reuti > Here's an example of one host that doesn't work: > > # qconf -se ibm038 > hostname ibm038 > load_scaling NONE > complex_values virtual_free=16G > > # qsub -V -b y -cwd -now n -pe cores 7 -q all.q@ibm038 xclock Your job > 143 ("xclock") has been submitted # qstat -j 143 > ============================================================== > job_number: 143 > exec_file: job_scripts/143 > submission_time: Fri Dec 16 10:46:02 2016 > owner: johnt > uid: 162 > group: sa > gid: 4563 > sge_o_home: /home/johnt > sge_o_log_name: johnt > sge_o_path: > /home/sge/sge8.1.9-1.el5/bin:/home/sge/sge8.1.9-1.el5/bin/lx-amd64:/bin:/usr/bin:/usr/local/bin:/usr/X11R6/bin:/home/johnt/bin:. > sge_o_shell: /bin/tcsh > sge_o_workdir: /home/johnt/sge8 > sge_o_host: ibm005 > account: sge > cwd: /home/johnt/sge8 > mail_list: johnt@ibm005 > notify: FALSE > job_name: xclock > jobshare: 0 > hard_queue_list: all.q@ibm038 > env_list: TERM=xterm,DISPLAY=dsls11:3.0,HOME= [..] > script_file: xclock > parallel environment: cores range: 7 > binding: NONE > job_type: binary > scheduling info: cannot run in PE "cores" because it only offers 0 > slots > > Here's an example of a host that does work: > > # qconf -se ibm037 > hostname ibm037 > load_scaling NONE > complex_values NONE > > # qsub -V -b y -cwd -now n -pe cores 7 -q all.q@ibm037 xclock Your job > 144 ("xclock") has been submitted # qstat -j 144 > ============================================================== > job_number: 144 > exec_file: job_scripts/144 > submission_time: Fri Dec 16 10:49:35 2016 > owner: johnt > uid: 162 > group: sa > gid: 4563 > sge_o_home: /home/johnt > sge_o_log_name: johnt > sge_o_path: > /home/sge/sge8.1.9-1.el5/bin:/home/sge/sge8.1.9-1.el5/bin/lx-amd64:/bin:/usr/bin:/usr/local/bin:/usr/X11R6/bin:/home/johnt/bin:. > sge_o_shell: /bin/tcsh > sge_o_workdir: /home/johnt/sge8 > sge_o_host: ibm005 > account: sge > cwd: /home/johnt/sge8 > mail_list: johnt@ibm005 > notify: FALSE > job_name: xclock > jobshare: 0 > hard_queue_list: all.q@ibm037 > env_list: TERM=xterm,DISPLAY=dsls11:3.0,HOME=/home/johnt > [..] > script_file: xclock > parallel environment: cores range: 7 > binding: NONE > job_type: binary > usage 1: cpu=00:00:00, mem=0.00000 GB s, io=0.00000 GB, > vmem=N/A, maxvmem=N/A > binding 1: NONE > > > > From: users-boun...@gridengine.org > [mailto:users-boun...@gridengine.org] On Behalf Of John_Tai > Sent: Wednesday, December 14, 2016 3:52 > To: Christopher Heiny > Cc: users@gridengine.org; Coleman, Marcus [JRDUS Non-J&J] > Subject: Re: [gridengine users] John's cores pe (Was: users Digest...) > > I'm actually using sge8.1.9-1 for all. Is there a problem with that? > Downloaded here: > > http://arc.liv.ac.uk/downloads/SGE/releases/8.1.9/ > > > > > > From: Christopher Heiny [mailto:christopherhe...@gmail.com] > Sent: Wednesday, December 14, 2016 3:26 > To: John_Tai > Cc: users@gridengine.org; Coleman, Marcus [JRDUS Non-J&J]; Reuti; > Christopher Heiny > Subject: Re: [gridengine users] John's cores pe (Was: users Digest...) > > > > On Dec 13, 2016 7:04 PM, "John_Tai" <john_...@smics.com> wrote: > I have 3 hosts in all.q, it seems the 2 servers running RHEL5.3 (ibm037, > ibm038) do not work with PE, while the server with RHEL6.8 (ibm021) is > working ok. Their conf are identical: > > > Hmmmm. Might be a Grid Engine version mismatch issue. If you installed from > RH rpms, then I think EL5.3 is on 6.1u4 and EL6.8 is on 6.2u3 or 6.2u5. > > > > > > # qconf -sq all.q@ibm038 > qname all.q > hostname ibm038 > seq_no 0 > load_thresholds np_load_avg=1.75 > suspend_thresholds NONE > nsuspend 1 > suspend_interval 00:05:00 > priority 0 > min_cpu_interval 00:05:00 > processors UNDEFINED > qtype BATCH INTERACTIVE > ckpt_list NONE > pe_list cores > rerun FALSE > slots 8 > tmpdir /tmp > shell /bin/sh > prolog NONE > epilog NONE > shell_start_mode posix_compliant > starter_method NONE > suspend_method NONE > resume_method NONE > terminate_method NONE > notify 00:00:60 > owner_list NONE > user_lists NONE > xuser_lists NONE > subordinate_list NONE > complex_values NONE > projects NONE > xprojects NONE > calendar NONE > initial_state default > s_rt INFINITY > h_rt INFINITY > s_cpu INFINITY > h_cpu INFINITY > s_fsize INFINITY > h_fsize INFINITY > s_data INFINITY > h_data INFINITY > s_stack INFINITY > h_stack INFINITY > s_core INFINITY > h_core INFINITY > s_rss INFINITY > h_rss INFINITY > s_vmem INFINITY > h_vmem INFINITY > > > > > > -----Original Message----- > From: Christopher Heiny [mailto:che...@synaptics.com] > Sent: Wednesday, December 14, 2016 10:21 > To: John_Tai; Reuti > Cc: Coleman, Marcus [JRDUS Non-J&J]; users@gridengine.org > Subject: Re: John's cores pe (Was: users Digest...) > > On Wed, 2016-12-14 at 02:03 +0000, John_Tai wrote: > > I switched schedd_job_info to true, these are the outputs you > > requested: > > > > > > > > # qstat -j 95 > > ============================================================== > > job_number: 95 > > exec_file: job_scripts/95 > > submission_time: Tue Dec 13 08:50:34 2016 > > owner: johnt > > uid: 162 > > group: sa > > gid: 4563 > > sge_o_home: /home/johnt > > sge_o_log_name: johnt > > sge_o_path: /home/sge/sge8.1.9- > > 1.el5/bin:/home/sge/sge8.1.9-1.el5/bin/lx- > > amd64:/bin:/usr/bin:/usr/local/bin:/usr/X11R6/bin:/home/johnt/bin:. > > sge_o_shell: /bin/tcsh > > sge_o_workdir: /home/johnt/sge8 > > sge_o_host: ibm005 > > account: sge > > cwd: /home/johnt/sge8 > > mail_list: johnt@ibm005 > > notify: FALSE > > job_name: xclock > > jobshare: 0 > > hard_queue_list: all.q@ibm038 > > env_list: TERM=xterm,DISPLAY=dsls11:3.0,HOME=/home/ > > johnt,SHELL=/bin/tcsh,USER=johnt,LOGNAME=johnt,PATH=/home/sge/sge8.1. > > 9-1.el5/bin:/home/sge/sge8.1.9-1.el5/bin/lx- > > amd64:/bin:/usr/bin:/usr/local/bin:/usr/X11R6/bin:/home/johnt/bin:., > > H > > OSTTYPE=x86_64- > > linux,VENDOR=unknown,OSTYPE=linux,MACHTYPE=x86_64,SHLVL=1,PWD=/home/ > > j > > ohnt/sge8,GROUP=sa,HOST=ibm005,REMOTEHOST=dsls11,MAIL=/var/spool/mai > > l > > /johnt,LS_COLORS=no=00:fi=00:di=00;36:ln=00;34:pi=40;33:so=01;31:bd= > > 4 > > 0;33:cd=40;33:or=40;31:ex=00;31:*.tar=00;33:*.tgz=00;33:*.zip=00;33: > > * > > .bz2=00;33:*.z=00;33:*.Z=00;33:*.gz=00;33:*.ev=00;41,G_BROKEN_FILENA > > M > > ES=1,SSH_ASKPASS=/usr/libexec/openssh/gnome-ssh- > > askpass,KDE_IS_PRELINKED=1,KDEDIR=/usr,LANG=en_US.UTF- > > 8,LESSOPEN=|/usr/bin/lesspipe.sh > > %s,HOSTNAME=ibm005,INPUTRC=/etc/inputrc,ASSURA_AUTO_64BIT=NONE,EDITO > > R > > =vi,TOP=-ores > > 60,CVSROOT=/home/edamgr/CVSTF,OPERA_PLUGIN_PATH=/usr/java/jre1.5.0_0 > > 1 > > /plugin/i386/ns7,NPX_PLUGIN_PATH=/usr/java/jre1.5.0_01/plugin/i386/n > > s > > 7,MANPATH=/home/sge/sg! > > e8.1.9- > > 1.el5/man:/usr/share/man:/usr/X11R6/man:/usr/kerberos/man,LD_LIBRARY > > _ > > PATH=/usr/lib:/usr/local/lib:/usr/lib64:/usr/local/lib64,MGC_HOME=/h > > o > > me/eda/mentor/aoi_cal_2015.3_25.16,CALIBRE_LM_LOG_LEVEL=WARN,MGLS_LI > > C > > ENSE_FILE=1717@ibm004:1717@ibm005:1717@ibm041:1717@ibm042:1717@ibm04 > > 3 > > :1717@ibm033:1717@ibm044:1717@td156:1717@td158:1717@ATD222,MGC_CALGU > > I > > _RELEASE_LICENSE_TIME=0.5,MGC_RVE_RELEASE_LICENSE_TIME=0.5,SOSCAD=/c > > a > > d,EDA_TOOL_SETUP_ROOT=/cad/toolSetup,EDA_TOOL_SETUP_VERSION=1.0,SGE_ > > R > > OOT=/home/sge/sge8.1.9-1.el5,SGE_ARCH=lx- > > amd64,SGE_CELL=cell2,SGE_CLUSTER_NAME=p6444,SGE_QMASTER_PORT=6444,SG > > E > > _EXECD_PORT=6445,DRMAA_LIBRARY_PATH=/home/sge/sge8.1.9- > > 1.el5/lib//libdrmaa.so > > script_file: xclock > > parallel environment: cores range: 1 > > binding: NONE > > job_type: binary > > scheduling info: cannot run in queue "pc.q" because it is > > not contained in its hard queue list (-q) > > cannot run in queue "sim.q" because it > > is not contained in its hard queue list (-q) > > cannot run in queue "all.q@ibm021" > > because it is not contained in its hard queue list (-q) > > cannot run in PE "cores" because it only > > offers 0 slots > > Hmmmm. Just a wild idea, but I'm thinking maybe there's something wacky > about ibm038's particular configuration. What does > qconf -sq all.q@ibm038 > say? > > And what happens if you use this qsub command? > qsub -V -b y -cwd -now n -pe cores 2 -q all.q xclock > > Cheers, > Chris > > > ________________________________ > > This email (including its attachments, if any) may be confidential and > proprietary information of SMIC, and intended only for the use of the named > recipient(s) above. Any unauthorized use or disclosure of this email is > strictly prohibited. If you are not the intended recipient(s), please notify > the sender immediately and delete this email from your computer. > > _______________________________________________ > users mailing list > users@gridengine.org > https://gridengine.org/mailman/listinfo/users > > This email (including its attachments, if any) may be confidential and > proprietary information of SMIC, and intended only for the use of the named > recipient(s) above. Any unauthorized use or disclosure of this email is > strictly prohibited. If you are not the intended recipient(s), please notify > the sender immediately and delete this email from your computer. > > This email (including its attachments, if any) may be confidential and > proprietary information of SMIC, and intended only for the use of the named > recipient(s) above. Any unauthorized use or disclosure of this email is > strictly prohibited. If you are not the intended recipient(s), please notify > the sender immediately and delete this email from your computer. > > _______________________________________________ > users mailing list > users@gridengine.org > https://gridengine.org/mailman/listinfo/users ________________________________ This email (including its attachments, if any) may be confidential and proprietary information of SMIC, and intended only for the use of the named recipient(s) above. Any unauthorized use or disclosure of this email is strictly prohibited. If you are not the intended recipient(s), please notify the sender immediately and delete this email from your computer. _______________________________________________ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users