I have pinpointed the problem, but I don’t know how to solve it. It looks like hosts with complex virtual_free cannot run jobs that require PE, even though the virtual_free complex was not requested by the job. I set the virtual_free to allow jobs to request RAM, so the goal is the each job to request both RAM and number of cpu cores. Hopefully this helps figuring out a solution. Thanks.
Here’s an example of one host that doesn’t work: # qconf -se ibm038 hostname ibm038 load_scaling NONE complex_values virtual_free=16G # qsub -V -b y -cwd -now n -pe cores 7 -q all.q@ibm038 xclock Your job 143 ("xclock") has been submitted # qstat -j 143 ============================================================== job_number: 143 exec_file: job_scripts/143 submission_time: Fri Dec 16 10:46:02 2016 owner: johnt uid: 162 group: sa gid: 4563 sge_o_home: /home/johnt sge_o_log_name: johnt sge_o_path: /home/sge/sge8.1.9-1.el5/bin:/home/sge/sge8.1.9-1.el5/bin/lx-amd64:/bin:/usr/bin:/usr/local/bin:/usr/X11R6/bin:/home/johnt/bin:. sge_o_shell: /bin/tcsh sge_o_workdir: /home/johnt/sge8 sge_o_host: ibm005 account: sge cwd: /home/johnt/sge8 mail_list: johnt@ibm005 notify: FALSE job_name: xclock jobshare: 0 hard_queue_list: all.q@ibm038<mailto:all.q@ibm038> env_list: TERM=xterm,DISPLAY=dsls11:3.0,HOME= [..] script_file: xclock parallel environment: cores range: 7 binding: NONE job_type: binary scheduling info: cannot run in PE "cores" because it only offers 0 slots Here’s an example of a host that does work: # qconf -se ibm037 hostname ibm037 load_scaling NONE complex_values NONE # qsub -V -b y -cwd -now n -pe cores 7 -q all.q@ibm037 xclock Your job 144 ("xclock") has been submitted # qstat -j 144 ============================================================== job_number: 144 exec_file: job_scripts/144 submission_time: Fri Dec 16 10:49:35 2016 owner: johnt uid: 162 group: sa gid: 4563 sge_o_home: /home/johnt sge_o_log_name: johnt sge_o_path: /home/sge/sge8.1.9-1.el5/bin:/home/sge/sge8.1.9-1.el5/bin/lx-amd64:/bin:/usr/bin:/usr/local/bin:/usr/X11R6/bin:/home/johnt/bin:. sge_o_shell: /bin/tcsh sge_o_workdir: /home/johnt/sge8 sge_o_host: ibm005 account: sge cwd: /home/johnt/sge8 mail_list: johnt@ibm005 notify: FALSE job_name: xclock jobshare: 0 hard_queue_list: all.q@ibm037 env_list: TERM=xterm,DISPLAY=dsls11:3.0,HOME=/home/johnt [..] script_file: xclock parallel environment: cores range: 7 binding: NONE job_type: binary usage 1: cpu=00:00:00, mem=0.00000 GB s, io=0.00000 GB, vmem=N/A, maxvmem=N/A binding 1: NONE From: users-boun...@gridengine.org [mailto:users-boun...@gridengine.org] On Behalf Of John_Tai Sent: Wednesday, December 14, 2016 3:52 To: Christopher Heiny Cc: users@gridengine.org; Coleman, Marcus [JRDUS Non-J&J] Subject: Re: [gridengine users] John's cores pe (Was: users Digest...) I’m actually using sge8.1.9-1 for all. Is there a problem with that? Downloaded here: http://arc.liv.ac.uk/downloads/SGE/releases/8.1.9/ From: Christopher Heiny [mailto:christopherhe...@gmail.com] Sent: Wednesday, December 14, 2016 3:26 To: John_Tai Cc: users@gridengine.org<mailto:users@gridengine.org>; Coleman, Marcus [JRDUS Non-J&J]; Reuti; Christopher Heiny Subject: Re: [gridengine users] John's cores pe (Was: users Digest...) On Dec 13, 2016 7:04 PM, "John_Tai" <john_...@smics.com<mailto:john_...@smics.com>> wrote: I have 3 hosts in all.q, it seems the 2 servers running RHEL5.3 (ibm037, ibm038) do not work with PE, while the server with RHEL6.8 (ibm021) is working ok. Their conf are identical: Hmmmm. Might be a Grid Engine version mismatch issue. If you installed from RH rpms, then I think EL5.3 is on 6.1u4 and EL6.8 is on 6.2u3 or 6.2u5. # qconf -sq all.q@ibm038<mailto:all.q@ibm038> qname all.q hostname ibm038 seq_no 0 load_thresholds np_load_avg=1.75 suspend_thresholds NONE nsuspend 1 suspend_interval 00:05:00 priority 0 min_cpu_interval 00:05:00 processors UNDEFINED qtype BATCH INTERACTIVE ckpt_list NONE pe_list cores rerun FALSE slots 8 tmpdir /tmp shell /bin/sh prolog NONE epilog NONE shell_start_mode posix_compliant starter_method NONE suspend_method NONE resume_method NONE terminate_method NONE notify 00:00:60 owner_list NONE user_lists NONE xuser_lists NONE subordinate_list NONE complex_values NONE projects NONE xprojects NONE calendar NONE initial_state default s_rt INFINITY h_rt INFINITY s_cpu INFINITY h_cpu INFINITY s_fsize INFINITY h_fsize INFINITY s_data INFINITY h_data INFINITY s_stack INFINITY h_stack INFINITY s_core INFINITY h_core INFINITY s_rss INFINITY h_rss INFINITY s_vmem INFINITY h_vmem INFINITY -----Original Message----- From: Christopher Heiny [mailto:che...@synaptics.com<mailto:che...@synaptics.com>] Sent: Wednesday, December 14, 2016 10:21 To: John_Tai; Reuti Cc: Coleman, Marcus [JRDUS Non-J&J]; users@gridengine.org<mailto:users@gridengine.org> Subject: Re: John's cores pe (Was: users Digest...) On Wed, 2016-12-14 at 02:03 +0000, John_Tai wrote: > I switched schedd_job_info to true, these are the outputs you > requested: > > > > # qstat -j 95 > ============================================================== > job_number: 95 > exec_file: job_scripts/95 > submission_time: Tue Dec 13 08:50:34 2016 > owner: johnt > uid: 162 > group: sa > gid: 4563 > sge_o_home: /home/johnt > sge_o_log_name: johnt > sge_o_path: /home/sge/sge8.1.9- > 1.el5/bin:/home/sge/sge8.1.9-1.el5/bin/lx- > amd64:/bin:/usr/bin:/usr/local/bin:/usr/X11R6/bin:/home/johnt/bin:. > sge_o_shell: /bin/tcsh > sge_o_workdir: /home/johnt/sge8 > sge_o_host: ibm005 > account: sge > cwd: /home/johnt/sge8 > mail_list: johnt@ibm005 > notify: FALSE > job_name: xclock > jobshare: 0 > hard_queue_list: all.q@ibm038<mailto:all.q@ibm038> > env_list: TERM=xterm,DISPLAY=dsls11:3.0,HOME=/home/ > johnt,SHELL=/bin/tcsh,USER=johnt,LOGNAME=johnt,PATH=/home/sge/sge8.1. > 9-1.el5/bin:/home/sge/sge8.1.9-1.el5/bin/lx- > amd64:/bin:/usr/bin:/usr/local/bin:/usr/X11R6/bin:/home/johnt/bin:.,H > OSTTYPE=x86_64- > linux,VENDOR=unknown,OSTYPE=linux,MACHTYPE=x86_64,SHLVL=1,PWD=/home/j > ohnt/sge8,GROUP=sa,HOST=ibm005,REMOTEHOST=dsls11,MAIL=/var/spool/mail > /johnt,LS_COLORS=no=00:fi=00:di=00;36:ln=00;34:pi=40;33:so=01;31:bd=4 > 0;33:cd=40;33:or=40;31:ex=00;31:*.tar=00;33:*.tgz=00;33:*.zip=00;33:* > .bz2=00;33:*.z=00;33:*.Z=00;33:*.gz=00;33:*.ev=00;41,G_BROKEN_FILENAM > ES=1,SSH_ASKPASS=/usr/libexec/openssh/gnome-ssh- > askpass,KDE_IS_PRELINKED=1,KDEDIR=/usr,LANG=en_US.UTF- > 8,LESSOPEN=|/usr/bin/lesspipe.sh > %s,HOSTNAME=ibm005,INPUTRC=/etc/inputrc,ASSURA_AUTO_64BIT=NONE,EDITOR > =vi,TOP=-ores > 60,CVSROOT=/home/edamgr/CVSTF,OPERA_PLUGIN_PATH=/usr/java/jre1.5.0_01 > /plugin/i386/ns7,NPX_PLUGIN_PATH=/usr/java/jre1.5.0_01/plugin/i386/ns > 7,MANPATH=/home/sge/sg! > e8.1.9- > 1.el5/man:/usr/share/man:/usr/X11R6/man:/usr/kerberos/man,LD_LIBRARY_ > PATH=/usr/lib:/usr/local/lib:/usr/lib64:/usr/local/lib64,MGC_HOME=/ho > me/eda/mentor/aoi_cal_2015.3_25.16,CALIBRE_LM_LOG_LEVEL=WARN,MGLS_LIC > ENSE_FILE=1717@ibm004:1717@ibm005:1717@ibm041:1717@ibm042:1717@ibm043 > :1717@ibm033:1717@ibm044:1717@td156:1717@td158:1717@ATD222,MGC_CALGUI > _RELEASE_LICENSE_TIME=0.5,MGC_RVE_RELEASE_LICENSE_TIME=0.5,SOSCAD=/ca > d,EDA_TOOL_SETUP_ROOT=/cad/toolSetup,EDA_TOOL_SETUP_VERSION=1.0,SGE_R > OOT=/home/sge/sge8.1.9-1.el5,SGE_ARCH=lx- > amd64,SGE_CELL=cell2,SGE_CLUSTER_NAME=p6444,SGE_QMASTER_PORT=6444,SGE > _EXECD_PORT=6445,DRMAA_LIBRARY_PATH=/home/sge/sge8.1.9- > 1.el5/lib//libdrmaa.so > script_file: xclock > parallel environment: cores range: 1 > binding: NONE > job_type: binary > scheduling info: cannot run in queue "pc.q" because it is > not contained in its hard queue list (-q) > cannot run in queue "sim.q" because it is > not contained in its hard queue list (-q) > cannot run in queue > "all.q@ibm021<mailto:all.q@ibm021>" > because it is not contained in its hard queue list (-q) > cannot run in PE "cores" because it only > offers 0 slots Hmmmm. Just a wild idea, but I'm thinking maybe there's something wacky about ibm038's particular configuration. What does qconf -sq all.q@ibm038<mailto:all.q@ibm038> say? And what happens if you use this qsub command? qsub -V -b y -cwd -now n -pe cores 2 -q all.q xclock Cheers, Chris ________________________________ This email (including its attachments, if any) may be confidential and proprietary information of SMIC, and intended only for the use of the named recipient(s) above. Any unauthorized use or disclosure of this email is strictly prohibited. If you are not the intended recipient(s), please notify the sender immediately and delete this email from your computer. _______________________________________________ users mailing list users@gridengine.org<mailto:users@gridengine.org> https://gridengine.org/mailman/listinfo/users ________________________________ This email (including its attachments, if any) may be confidential and proprietary information of SMIC, and intended only for the use of the named recipient(s) above. Any unauthorized use or disclosure of this email is strictly prohibited. If you are not the intended recipient(s), please notify the sender immediately and delete this email from your computer. ________________________________ This email (including its attachments, if any) may be confidential and proprietary information of SMIC, and intended only for the use of the named recipient(s) above. Any unauthorized use or disclosure of this email is strictly prohibited. If you are not the intended recipient(s), please notify the sender immediately and delete this email from your computer.
_______________________________________________ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users