Hi Joakim,
one more thing to mention:
Am 11.05.2020 um 19:23 schrieb Joakim Hove:
ubuntu@ip-172-31-80-232:/var/run/slurm-llnl$ scontrol show node
NodeName=ip-172-31-80-232 Arch=x86_64 CoresPerSocket=1
Reason=Low RealMemory [root@2020-05-11T16:20:02]
The "State=IDLE+DRAIN" looks a bit susp
Hi Erik,
the output of task-prolog is sourced/evaluated (not really sure, how) in
the job environment.
Thus you don't have to export a variable in task-prolog, but echo the
export, e.g.
echo export TMPDIR=/scratch/$SLURM_JOB_ID
The variable will then be set in job environment.
Best
Marcu
Maybe too obvious, but have you checked your .bashrc, .bash_profile and
such?
Brian Andrus
On 5/12/2020 10:27 AM, Ellestad, Erik wrote:
Which SLURM prolog specifically?
I’m not finding that to work for me in either task-prolog or prolog.
SLURM_TMPDIR and TMPDIR are still both set to /tmp wh
What do you get from
sacct -o jobid,elapsed,reason,exit -j 533900,533902
On Tue, May 12, 2020 at 4:12 PM Alastair Neil wrote:
>
> The log is continuous and has all the messages logged by slurmd on the node
> for all the jobs mentioned, below are the entries from the slurmctld log:
>
>> [2020-0
The log is continuous and has all the messages logged by slurmd on the
node for all the jobs mentioned, below are the entries from the slurmctld
log:
[2020-05-10T00:26:03.097] _slurm_rpc_kill_job: REQUEST_KILL_JOB
> JobId=533898 uid 1224431221
>
[2020-05-10T00:26:03.098] email msg to sshr...@maso
Which SLURM prolog specifically?
I'm not finding that to work for me in either task-prolog or prolog.
SLURM_TMPDIR and TMPDIR are still both set to /tmp when I run a job.
Erik
--
Erik Ellestad
Wynton Cluster SysAdmin
UCSF
From: slurm-users On Behalf Of Roger
Moye
Sent: Tuesday, May 12, 2020
We had issues getting TMPDIR to work as well. We finally did this in our
prolog:
export SLURM_TMPDIR="/tmp/slurm/${SLURM_JOB_ID}"
This works.
-Roger
From: slurm-users On Behalf Of
Ellestad, Erik
Sent: Tuesday, May 12, 2020 10:40 AM
To: slurm-users@lists.schedmd.com
Subject: [slurm-users] R
You have defined both of your partitions with “Default=YES”, but Slurm can have
only one default partition. You can see from * on the compute partition in your
sinfo output that Slurm selected that one as the default. When you use srun or
sbatch it will only look at the default partition unless
I was wanted to set TMPDIR from /tmp to a per job directory I create in local
/scratch/$SLURM_JOB_ID (for example)
This bug suggests I should be able to do this in a task-prolog.
https://bugs.schedmd.com/show_bug.cgi?id=2664
However adding the following to task-prolog doesn't seem to affect the
Hi,
We have a cluster with 2 slave nodes. These are the slurm.conf lines
describing nodes and partitions:
*NodeName=slurm-gpu-1 NodeAddr=192.168.0.200 Procs=16 Gres=gpu:2
State=UNKNOWNNodeName=slurm-gpu-2 NodeAddr=192.168.0.124 Procs=1
Gres=gpu:0 State=UNKNOWNPartitionName=gpu Nodes=slurm-gpu
I see one job cancelled and two jobs failed.
Your slurmd log is incomplete -- it doesn't show the two failed jobs
exiting/failing, so the real error is not here.
It might also be helpful to look through slurmctld's log starting from
when the first job was canceled, looking at any messages mentioni
Hello,
Yesterday I instituted job accounting via mysql on my (FreeBSD 11.3)
test cluster. The cluster consists of a machine running
slurmctld+slurmdbd and two running slurmd (slurm version 20.02.1).
After experiencing a slurmdbd core dump when using mysql-5.7.30
(reported on this list on May 5) I
Hi
With the following memory stats on two nodes
[root@hpc slurm]# scontrol show node compute-0-0 | grep Memory
RealMemory=64259 AllocMem=0 FreeMem=63429 Sockets=32 Boards=1
[root@hpc slurm]# scontrol show node compute-0-1 | grep Memory
RealMemory=120705 AllocMem=1024 FreeMem=103051 Sockets=3
13 matches
Mail list logo