Good morning.
I’m struggling with getting strigger working correctly.
My end goal sounds fairly simple: to get a mail notification if a node gets set
into ‘drain’ mode.
The man page for strigger states it must be run by the set slurmuser which is
slurm:
# scontrol show config | grep SlurmUser
S
Thank you to both Kilian and Chris,
I have running on the slurm server to report once when any of the nodes go into
“Drain” State:
sudo -u slurm bash -c “strigger --set -D -p
/etc/slurm/triggers/slurm_admin_notify --flags=perm"
/bin/mail -s “ClusterName DrainedNode:$*” our_admin_email_address
fyi… Joe is there now staining front entrance & fixing a few minor touchups,
nailing baseboard in basement…
Lock box is on the house now w/ key in it…
On Jul 26, 2019, at 11:28 AM, Jeffrey Frey
mailto:f...@udel.edu>> wrote:
If you check the source code (src/slurmctld/job_mgr.c) this error is i
Good morning.
I’m wondering if one could point me in the right direction to fulfill a request
on one of our small clusters.
Cluster info:
* 5 nodes with 4 gpus/28 cpus each node.
* User 1 only will submit to cpus, all other 8 users will submit to gpus
* only one account in the database with
Good morning.
I have having the same experience here. Wondering if you had a resolution?
Thank you.
Jodie
On Jun 11, 2020, at 3:27 PM, Rhian Resnick
mailto:rresn...@fau.edu>> wrote:
We have several users submitting single GPU jobs to our cluster. We expected
the jobs to fill each node and fu
ot
schedule any more jobs to the GPUs. Needed to disable binding in job submission
to schedule to all of them.
Not sure that applies in your situation (don't know your system), but it's
something to check?
Tina
On 07/08/2020 15:42, Jodie H. Sprouse wrote:
> Good morning.
> I ha
PUs=14-27
Name=gpu Type=tesla File=/dev/nvidia3 CPUs=14-27
to 'assign' all GPUs to the first 14 CPUs or second 14 CPUs (your config makes
me think there are two 14 core CPUs, so cores 0-13 would probably be CPU1 etc?)
(What is the actual topology of the system (according to, say
got 2 jobs currently running on each node that’s available.
>
> So maybe:
>
> NodeName=c0005 Name=gpu File=/dev/nvidia[0-3] CPUs=0-10,11-21,22-32,33-43
>
> would work?
>
>> On Aug 7, 2020, at 12:40 PM, Jodie H. Sprouse wrote:
>>
>> External Email W
got 2 jobs currently running on each node that’s available.
>
> So maybe:
>
> NodeName=c0005 Name=gpu File=/dev/nvidia[0-3] CPUs=0-10,11-21,22-32,33-43
>
> would work?
>
>> On Aug 7, 2020, at 12:40 PM, Jodie H. Sprouse wrote:
>>
>> External Email W