Re: [gridengine users] TMPDIR is missing from prolog script (CentOS 7 SGE 8.1.9)

2018-12-06 Thread Derrick Lin
Just some additional info, confirm that TMPDIR exists in job environment and the sge_execd does create the TMPDIR path on the designate file system. Here is my test: $ cat sge_dd_allq.sh #!/bin/bash # # It prints the actual path of the job scratch directory. # #$ -j y #$ -cwd #$ -N dd_allq #$ -l m

[gridengine users] TMPDIR is missing from prolog script (CentOS 7 SGE 8.1.9)

2018-12-06 Thread Derrick Lin
Hi all, We are switching to a cluster of CentOS7 with SGE 8.1.9 installed. We have a prolog script that does XFS disk space allocation according to TMPDIR. However, the prolog script does not receive TMPDIR which should be created by the scheduler. Other variables such as JOB_ID, PE_HOSTFILE ar

Re: [gridengine users] $TMPDIR With MPI Jobs

2018-12-06 Thread Reuti
I found my entry about this: https://arc.liv.ac.uk/trac/SGE/ticket/570 -- Reuti > Am 06.12.2018 um 19:03 schrieb Reuti : > > Hi, > >> Am 06.12.2018 um 18:36 schrieb Dan Whitehouse : >> >> Hi, >> I've been running some MPI jobs and I expected that when the job started >> a $TMPDIR would be cr

Re: [gridengine users] $TMPDIR With MPI Jobs

2018-12-06 Thread MacMullan IV, Hugh
Perhaps you can 'wrap' the 'work' in a small script (work.sh), like: #!/bin/bash ## pre-work echo TMP: $TMP OUT=$TMP/env.$JOB_ID.$OMPI_COMM_WORLD_RANK.txt ## WORK env > $OUT 2>&1 ## report and clean up? ls -la $TMP rsync -av $OUT $SGE_CWD_PATH Then use a wrapper.sh job script to 'mpiexec ./work.s

Re: [gridengine users] $TMPDIR With MPI Jobs

2018-12-06 Thread Reuti
Hi, > Am 06.12.2018 um 18:36 schrieb Dan Whitehouse : > > Hi, > I've been running some MPI jobs and I expected that when the job started > a $TMPDIR would be created on all of the nodes, however with our (UGE) > configuration that does not appear to be the case. > > It appears that while on th

[gridengine users] $TMPDIR With MPI Jobs

2018-12-06 Thread Dan Whitehouse
Hi, I've been running some MPI jobs and I expected that when the job started a $TMPDIR would be created on all of the nodes, however with our (UGE) configuration that does not appear to be the case. It appears that while on the "master" node a $TMPDIR is created and persists for the duration of

Re: [gridengine users] problem with concurrent jobs

2018-12-06 Thread MacMullan IV, Hugh
Sweet! Glad to see you’re up and running. -H From: Dimar Jaime González Soto Sent: Thursday, December 6, 2018 11:57 AM To: MacMullan IV, Hugh Subject: Re: [gridengine users] problem with concurrent jobs I disabled that quota and no I can see 60 processes running. Thanks El jue., 6 dic. 2018

Re: [gridengine users] problem with concurrent jobs

2018-12-06 Thread MacMullan IV, Hugh
Yes, as Ruti said: What’s the output from ‘qconf -srqs’ (line 1 of the max_slots rule)? Looks like you’re being blocked there (RQS). -H From: Dimar Jaime González Soto Sent: Thursday, December 6, 2018 11:22 AM To: MacMullan IV, Hugh Subject: Re: [gridengine users] problem with concurrent jobs

Re: [gridengine users] problem with concurrent jobs

2018-12-06 Thread MacMullan IV, Hugh
Also: I'm surprised 'qalter -w p ' doesn't show any output. Did you forget the JOBID? -H -Original Message- From: users-boun...@gridengine.org On Behalf Of Reuti Sent: Thursday, December 6, 2018 11:04 AM To: Dimar Jaime González Soto Cc: users@gridengine.org Subject: Re: [gridengine u

Re: [gridengine users] problem with concurrent jobs

2018-12-06 Thread Reuti
> Am 06.12.2018 um 16:59 schrieb Dimar Jaime González Soto > : > > qconf -sconf shows: > > #global: > execd_spool_dir /var/spool/gridengine/execd > ... > ax_aj_tasks 75000 So, this is fine too. Next place: is the amount of overall slots limited: $ qconf -se global

Re: [gridengine users] problem with concurrent jobs

2018-12-06 Thread Dimar Jaime González Soto
qconf -sconf shows: #global: execd_spool_dir /var/spool/gridengine/execd mailer /usr/bin/mail xterm/usr/bin/xterm load_sensor none prolog none epilog none shell_start_mode

Re: [gridengine users] problem with concurrent jobs

2018-12-06 Thread Reuti
> Am 06.12.2018 um 15:19 schrieb Dimar Jaime González Soto > : > > qconf -se ubuntu-node2 : > > hostname ubuntu-node2 > load_scaling NONE > complex_valuesNONE > load_values arch=lx26-amd64,num_proc=16,mem_total=48201.960938M, \ > sw

Re: [gridengine users] problem with concurrent jobs

2018-12-06 Thread Dimar Jaime González Soto
qconf -se ubuntu-node2 : hostname ubuntu-node2 load_scaling NONE complex_valuesNONE load_values arch=lx26-amd64,num_proc=16,mem_total=48201.960938M, \ swap_total=95746.996094M,virtual_total=143948.957031M, \ load_avg=3.74,load_shor

Re: [gridengine users] problem with concurrent jobs

2018-12-06 Thread Dimar Jaime González Soto
qhost : HOSTNAMEARCH NCPU LOAD MEMTOT MEMUSE SWAPTO SWAPUS --- global - - - - - - - ubuntu-frontend lx26-amd64 16 4.13 31.4

Re: [gridengine users] problem with concurrent jobs

2018-12-06 Thread Reuti
> Am 06.12.2018 um 15:07 schrieb Dimar Jaime González Soto > : > > qalter -w p doesn't shows anything, qstat shows 16 processes and not 60: > > 250 0.5 OMAcbuach r 12/06/2018 11:04:15 > main.q@ubuntu-node21 1 > 250 0.5 OMAcbuach

Re: [gridengine users] problem with concurrent jobs

2018-12-06 Thread Dimar Jaime González Soto
qalter -w p doesn't shows anything, qstat shows 16 processes and not 60: 250 0.5 OMAcbuach r 12/06/2018 11:04:15 main.q@ubuntu-node21 1 250 0.5 OMAcbuach r 12/06/2018 11:04:15 main.q@ubuntu-node12 1 2 250 0.5000

Re: [gridengine users] problem with concurrent jobs

2018-12-06 Thread Reuti
> Am 06.12.2018 um 09:47 schrieb Hay, William : > > On Wed, Dec 05, 2018 at 03:29:23PM -0300, Dimar Jaime Gonz??lez Soto wrote: >> the app site is https://omabrowser.org/standalone/ I tried to make a >> parallel environment but it didn't work. > The website indicates that an array job should

Re: [gridengine users] problem with concurrent jobs

2018-12-06 Thread Hay, William
On Wed, Dec 05, 2018 at 03:29:23PM -0300, Dimar Jaime Gonz??lez Soto wrote: >the app site is https://omabrowser.org/standalone/ I tried to make a >parallel environment but it didn't work. The website indicates that an array job should work for this. Has the load average spiked to the point