Am 16.12.2016 um 03:54 schrieb John_Tai: > I have pinpointed the problem, but I don’t know how to solve it. > > It looks like hosts with complex virtual_free cannot run jobs that require > PE, even though the virtual_free complex was not requested by the job. I set > the virtual_free to allow jobs to request RAM, so the goal is the each job to > request both RAM and number of cpu cores. Hopefully this helps figuring out a > solution. Thanks.
How does the definition of the complex look like in `qconf -sc`? -- Reuti > Here’s an example of one host that doesn’t work: > > # qconf -se ibm038 > hostname ibm038 > load_scaling NONE > complex_values virtual_free=16G > > # qsub -V -b y -cwd -now n -pe cores 7 -q all.q@ibm038 xclock > Your job 143 ("xclock") has been submitted > # qstat -j 143 > ============================================================== > job_number: 143 > exec_file: job_scripts/143 > submission_time: Fri Dec 16 10:46:02 2016 > owner: johnt > uid: 162 > group: sa > gid: 4563 > sge_o_home: /home/johnt > sge_o_log_name: johnt > sge_o_path: > /home/sge/sge8.1.9-1.el5/bin:/home/sge/sge8.1.9-1.el5/bin/lx-amd64:/bin:/usr/bin:/usr/local/bin:/usr/X11R6/bin:/home/johnt/bin:. > sge_o_shell: /bin/tcsh > sge_o_workdir: /home/johnt/sge8 > sge_o_host: ibm005 > account: sge > cwd: /home/johnt/sge8 > mail_list: johnt@ibm005 > notify: FALSE > job_name: xclock > jobshare: 0 > hard_queue_list: all.q@ibm038 > env_list: TERM=xterm,DISPLAY=dsls11:3.0,HOME= [..] > script_file: xclock > parallel environment: cores range: 7 > binding: NONE > job_type: binary > scheduling info: cannot run in PE "cores" because it only offers 0 > slots > > Here’s an example of a host that does work: > > # qconf -se ibm037 > hostname ibm037 > load_scaling NONE > complex_values NONE > > # qsub -V -b y -cwd -now n -pe cores 7 -q all.q@ibm037 xclock > Your job 144 ("xclock") has been submitted > # qstat -j 144 > ============================================================== > job_number: 144 > exec_file: job_scripts/144 > submission_time: Fri Dec 16 10:49:35 2016 > owner: johnt > uid: 162 > group: sa > gid: 4563 > sge_o_home: /home/johnt > sge_o_log_name: johnt > sge_o_path: > /home/sge/sge8.1.9-1.el5/bin:/home/sge/sge8.1.9-1.el5/bin/lx-amd64:/bin:/usr/bin:/usr/local/bin:/usr/X11R6/bin:/home/johnt/bin:. > sge_o_shell: /bin/tcsh > sge_o_workdir: /home/johnt/sge8 > sge_o_host: ibm005 > account: sge > cwd: /home/johnt/sge8 > mail_list: johnt@ibm005 > notify: FALSE > job_name: xclock > jobshare: 0 > hard_queue_list: all.q@ibm037 > env_list: TERM=xterm,DISPLAY=dsls11:3.0,HOME=/home/johnt > [..] > script_file: xclock > parallel environment: cores range: 7 > binding: NONE > job_type: binary > usage 1: cpu=00:00:00, mem=0.00000 GB s, io=0.00000 GB, > vmem=N/A, maxvmem=N/A > binding 1: NONE > > > > From: users-boun...@gridengine.org [mailto:users-boun...@gridengine.org] On > Behalf Of John_Tai > Sent: Wednesday, December 14, 2016 3:52 > To: Christopher Heiny > Cc: users@gridengine.org; Coleman, Marcus [JRDUS Non-J&J] > Subject: Re: [gridengine users] John's cores pe (Was: users Digest...) > > I’m actually using sge8.1.9-1 for all. Is there a problem with that? > Downloaded here: > > http://arc.liv.ac.uk/downloads/SGE/releases/8.1.9/ > > > > > > From: Christopher Heiny [mailto:christopherhe...@gmail.com] > Sent: Wednesday, December 14, 2016 3:26 > To: John_Tai > Cc: users@gridengine.org; Coleman, Marcus [JRDUS Non-J&J]; Reuti; Christopher > Heiny > Subject: Re: [gridengine users] John's cores pe (Was: users Digest...) > > > > On Dec 13, 2016 7:04 PM, "John_Tai" <john_...@smics.com> wrote: > I have 3 hosts in all.q, it seems the 2 servers running RHEL5.3 (ibm037, > ibm038) do not work with PE, while the server with RHEL6.8 (ibm021) is > working ok. Their conf are identical: > > > Hmmmm. Might be a Grid Engine version mismatch issue. If you installed from > RH rpms, then I think EL5.3 is on 6.1u4 and EL6.8 is on 6.2u3 or 6.2u5. > > > > > > # qconf -sq all.q@ibm038 > qname all.q > hostname ibm038 > seq_no 0 > load_thresholds np_load_avg=1.75 > suspend_thresholds NONE > nsuspend 1 > suspend_interval 00:05:00 > priority 0 > min_cpu_interval 00:05:00 > processors UNDEFINED > qtype BATCH INTERACTIVE > ckpt_list NONE > pe_list cores > rerun FALSE > slots 8 > tmpdir /tmp > shell /bin/sh > prolog NONE > epilog NONE > shell_start_mode posix_compliant > starter_method NONE > suspend_method NONE > resume_method NONE > terminate_method NONE > notify 00:00:60 > owner_list NONE > user_lists NONE > xuser_lists NONE > subordinate_list NONE > complex_values NONE > projects NONE > xprojects NONE > calendar NONE > initial_state default > s_rt INFINITY > h_rt INFINITY > s_cpu INFINITY > h_cpu INFINITY > s_fsize INFINITY > h_fsize INFINITY > s_data INFINITY > h_data INFINITY > s_stack INFINITY > h_stack INFINITY > s_core INFINITY > h_core INFINITY > s_rss INFINITY > h_rss INFINITY > s_vmem INFINITY > h_vmem INFINITY > > > > > > -----Original Message----- > From: Christopher Heiny [mailto:che...@synaptics.com] > Sent: Wednesday, December 14, 2016 10:21 > To: John_Tai; Reuti > Cc: Coleman, Marcus [JRDUS Non-J&J]; users@gridengine.org > Subject: Re: John's cores pe (Was: users Digest...) > > On Wed, 2016-12-14 at 02:03 +0000, John_Tai wrote: > > I switched schedd_job_info to true, these are the outputs you > > requested: > > > > > > > > # qstat -j 95 > > ============================================================== > > job_number: 95 > > exec_file: job_scripts/95 > > submission_time: Tue Dec 13 08:50:34 2016 > > owner: johnt > > uid: 162 > > group: sa > > gid: 4563 > > sge_o_home: /home/johnt > > sge_o_log_name: johnt > > sge_o_path: /home/sge/sge8.1.9- > > 1.el5/bin:/home/sge/sge8.1.9-1.el5/bin/lx- > > amd64:/bin:/usr/bin:/usr/local/bin:/usr/X11R6/bin:/home/johnt/bin:. > > sge_o_shell: /bin/tcsh > > sge_o_workdir: /home/johnt/sge8 > > sge_o_host: ibm005 > > account: sge > > cwd: /home/johnt/sge8 > > mail_list: johnt@ibm005 > > notify: FALSE > > job_name: xclock > > jobshare: 0 > > hard_queue_list: all.q@ibm038 > > env_list: TERM=xterm,DISPLAY=dsls11:3.0,HOME=/home/ > > johnt,SHELL=/bin/tcsh,USER=johnt,LOGNAME=johnt,PATH=/home/sge/sge8.1. > > 9-1.el5/bin:/home/sge/sge8.1.9-1.el5/bin/lx- > > amd64:/bin:/usr/bin:/usr/local/bin:/usr/X11R6/bin:/home/johnt/bin:.,H > > OSTTYPE=x86_64- > > linux,VENDOR=unknown,OSTYPE=linux,MACHTYPE=x86_64,SHLVL=1,PWD=/home/j > > ohnt/sge8,GROUP=sa,HOST=ibm005,REMOTEHOST=dsls11,MAIL=/var/spool/mail > > /johnt,LS_COLORS=no=00:fi=00:di=00;36:ln=00;34:pi=40;33:so=01;31:bd=4 > > 0;33:cd=40;33:or=40;31:ex=00;31:*.tar=00;33:*.tgz=00;33:*.zip=00;33:* > > .bz2=00;33:*.z=00;33:*.Z=00;33:*.gz=00;33:*.ev=00;41,G_BROKEN_FILENAM > > ES=1,SSH_ASKPASS=/usr/libexec/openssh/gnome-ssh- > > askpass,KDE_IS_PRELINKED=1,KDEDIR=/usr,LANG=en_US.UTF- > > 8,LESSOPEN=|/usr/bin/lesspipe.sh > > %s,HOSTNAME=ibm005,INPUTRC=/etc/inputrc,ASSURA_AUTO_64BIT=NONE,EDITOR > > =vi,TOP=-ores > > 60,CVSROOT=/home/edamgr/CVSTF,OPERA_PLUGIN_PATH=/usr/java/jre1.5.0_01 > > /plugin/i386/ns7,NPX_PLUGIN_PATH=/usr/java/jre1.5.0_01/plugin/i386/ns > > 7,MANPATH=/home/sge/sg! > > e8.1.9- > > 1.el5/man:/usr/share/man:/usr/X11R6/man:/usr/kerberos/man,LD_LIBRARY_ > > PATH=/usr/lib:/usr/local/lib:/usr/lib64:/usr/local/lib64,MGC_HOME=/ho > > me/eda/mentor/aoi_cal_2015.3_25.16,CALIBRE_LM_LOG_LEVEL=WARN,MGLS_LIC > > ENSE_FILE=1717@ibm004:1717@ibm005:1717@ibm041:1717@ibm042:1717@ibm043 > > :1717@ibm033:1717@ibm044:1717@td156:1717@td158:1717@ATD222,MGC_CALGUI > > _RELEASE_LICENSE_TIME=0.5,MGC_RVE_RELEASE_LICENSE_TIME=0.5,SOSCAD=/ca > > d,EDA_TOOL_SETUP_ROOT=/cad/toolSetup,EDA_TOOL_SETUP_VERSION=1.0,SGE_R > > OOT=/home/sge/sge8.1.9-1.el5,SGE_ARCH=lx- > > amd64,SGE_CELL=cell2,SGE_CLUSTER_NAME=p6444,SGE_QMASTER_PORT=6444,SGE > > _EXECD_PORT=6445,DRMAA_LIBRARY_PATH=/home/sge/sge8.1.9- > > 1.el5/lib//libdrmaa.so > > script_file: xclock > > parallel environment: cores range: 1 > > binding: NONE > > job_type: binary > > scheduling info: cannot run in queue "pc.q" because it is > > not contained in its hard queue list (-q) > > cannot run in queue "sim.q" because it is > > not contained in its hard queue list (-q) > > cannot run in queue "all.q@ibm021" > > because it is not contained in its hard queue list (-q) > > cannot run in PE "cores" because it only > > offers 0 slots > > Hmmmm. Just a wild idea, but I'm thinking maybe there's something wacky > about ibm038's particular configuration. What does > qconf -sq all.q@ibm038 > say? > > And what happens if you use this qsub command? > qsub -V -b y -cwd -now n -pe cores 2 -q all.q xclock > > Cheers, > Chris > > > ________________________________ > > This email (including its attachments, if any) may be confidential and > proprietary information of SMIC, and intended only for the use of the named > recipient(s) above. Any unauthorized use or disclosure of this email is > strictly prohibited. If you are not the intended recipient(s), please notify > the sender immediately and delete this email from your computer. > > _______________________________________________ > users mailing list > users@gridengine.org > https://gridengine.org/mailman/listinfo/users > > This email (including its attachments, if any) may be confidential and > proprietary information of SMIC, and intended only for the use of the named > recipient(s) above. Any unauthorized use or disclosure of this email is > strictly prohibited. If you are not the intended recipient(s), please notify > the sender immediately and delete this email from your computer. > > This email (including its attachments, if any) may be confidential and > proprietary information of SMIC, and intended only for the use of the named > recipient(s) above. Any unauthorized use or disclosure of this email is > strictly prohibited. If you are not the intended recipient(s), please notify > the sender immediately and delete this email from your computer. > > _______________________________________________ > users mailing list > users@gridengine.org > https://gridengine.org/mailman/listinfo/users _______________________________________________ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users