e/him) Sr. SysAdmin, URCF, Drexel
dw...@drexel.edu 215.571.4335 (o)
For URCF support: urcf-supp...@drexel.edu
https://proteusmaster.urcf.drexel.edu/urcfwiki
github:prehensilecode
researchgate.NET:David-Chin-6
From: slurm-users on behalf of
Chin,
Hi, Xand:
How does adding "ReqMem" to the sacct change the output?
E.g. on my cluster running Slurm 20.02.7 (on RHEL8), our GPU nodes have
TRESBillingWeights=CPU=0,Mem=0,GRES/gpu=43:
$ sacct --format=JobID%25,State,AllocTRES%50,ReqTRES,ReqMem,ReqCPUS|grep RUNNING
JobID
ensilecode
From: slurm-users on behalf of Ole Holm
Nielsen
Sent: Friday, November 5, 2021 03:26
To: slurm-users@lists.schedmd.com
Subject: Re: [slurm-users] Possible to get cluster utilization by partition?
External.
Hi Dave,
On 11/4/21 21:47, Chin,David wrote:
> I am running Slurm 20.02.7. I
Hi,
I am running Slurm 20.02.7. I would like to generate cluster utilization
report based on the billing TRES, but separated by partition.
I can get full cluster utilization using:
sreport cluster utilization -T billing start=2021-01-01 end=2021-06-30
but it would be useful for understandin
Hi, Sean:
Slurm version 20.02.6 (via Bright Cluster Manager)
ProctrackType=proctrack/cgroup
JobAcctGatherType=jobacct_gather/linux
JobAcctGatherParams=UsePss,NoShared
I just skimmed https://bugs.schedmd.com/show_bug.cgi?id=5549 because this job
appeared to have left two slurmstepd zombie
...@drexel.edu
https://proteusmaster.urcf.drexel.edu/urcfwiki
github:prehensilecode
From: slurm-users on behalf of
Chin,David
Sent: Monday, March 15, 2021 13:52
To: Slurm-Users List
Subject: [slurm-users] Job ended with OUT_OF_MEMORY even though MaxRSS and
MaxVMSize are
Hi Michael:
I looked at the Matlab script: it's loading an xlsx file which is 2.9 kB.
There are some "static" arrays allocated with ones() or zeros(), but those use
small subsets (< 10 columns) of the loaded data, and outputs are arrays of
6x10. Certainly there are not 16e9 rows in the original
to
that point.
-Paul Edmon-
On 3/15/2021 1:52 PM, Chin,David wrote:
Hi, all:
I'm trying to understand why a job exited with an error condition. I think it
was actually terminated by Slurm: job was a Matlab script, and its output was
incomplete.
Here's sacct output:
J
Hi, all:
I'm trying to understand why a job exited with an error condition. I think it
was actually terminated by Slurm: job was a Matlab script, and its output was
incomplete.
Here's sacct output:
JobIDJobName User PartitionNodeListElapsed
State Exit
My mistake - from slurm.conf(5):
SrunProlog runs on the node where the "srun" is executing.
i.e. the login node, which explains why the directory is not being created on
the compute node, while the echos work.
--
David Chin, PhD (he/him) Sr. SysAdmin, URCF, Drexel
dw...@drexel.edu
creating the directory in (chmod 1777 for the parent directory is good)
Brian Andrus
On 3/4/2021 9:03 AM, Chin,David wrote:
Hi, Brian:
So, this is my SrunProlog script -- I want a job-specific tmp dir, which makes
for easy cleanup at end of job:
#!/bin/bash
if [[ -z ${SLURM_ARRAY_JOB
o change a particular one (or more), use something like
--export=ALL,MYVAR=othervalue
do 'man srun' and look at the --export option
Brian Andrus
On 3/3/2021 9:28 PM, Chin,David wrote:
ahmet.mer...@uhem.itu.edu.tr<mailto:ahmet.mer...@uhem.itu.edu.tr> wrote:
> Prolog and Ta
shell on the compute node does not have the env variables set.
I use the same prolog script as TaskProlog, which sets it properly for jobs
submitted
with sbatch.
Thanks in advance,
Dave Chin
--
David Chin, PhD (he/him) Sr. SysAdmin, URCF, Drexel
dw...@drexel.edu 215.57
Hello, all:
Details:
* slurm 20.02.6
* MariaDB 10.3.17
* RHEL 8.1
I have a fairshare setup. I went through a couple of iterations in testing of
manually creating accounts and users that I later deleted before putting in
what is to be the production setup.
One of the deleted accounts
github:prehensilecode
From: slurm-users on behalf of
Chin,David
Sent: Friday, February 5, 2021 15:47
To: Slurm-Users List
Subject: [slurm-users] sacctmgr archive dump - no dump file produced, and data
not purged?
External.
Hi all:
I have a new cluster, and
Hello all:
I have a QOS defined which has the Flaq DenyOnLimit set:
$ sacctmgr show qos foo format=name,flags
NameFlags
--
foo DenyOnLimit
How can I "unset" that Flag?
I tried "sacctmgr modify qos foo unset Flags=DenyOnLimit",
;s".)
Is there something I am missing?
Thanks,
Dave Chin
--
David Chin, PhD (he/him) Sr. SysAdmin, URCF, Drexel
dw...@drexel.edu 215.571.4335 (o)
For URCF support: urcf-supp...@drexel.edu
https://proteusmaster.urcf.drexel.edu/urcfwiki
github:prehensilecode
Drexel Internal Data
17 matches
Mail list logo