dmap/
--
Rémi Palancher
Rackslab: Open Source Solutions for HPC Operations
https://rackslab.io
--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
ram must either trap SIGTERM
with a signal handler or you must enable send_user_signal
PreemptParameters flag and submit your job with --signal and another signal.
--
Rémi Palancher
Rackslab: Open Source Solutions for HPC Operations
https://rackslab.io/
sbatch: error: Batch job submission failed: Requested node configuration
> is not available
Do you have a MaxMemPerCPU on the cluster or on the partition? If this
value is too low, this could make the job fail due to CPU count limit.
--
Rémi Palancher
Rackslab: Open Source Solutions for HPC Operations
https://rackslab.io/
]
https://github.com/SchedMD/slurm/commit/b31fa177c1ca26dcd2d5cd952e692ef87d95b528
--
Rémi Palancher
Rackslab: Open Source Solutions for HPC Operations
https://rackslab.io/
tudents names=teacher
Then teacher will have the ability to cancel students' jobs among other things
(eg. set limits on students associations, etc). It won't have any special
privilege on other accounts.
--
Rémi Palancher
Rackslab: Open Source Solutions for HPC Operations
https://rackslab.io
ize the list of accounts users are coordinating with:
$ sacctmgr show users WithCoord
--
Rémi Palancher
Rackslab: Open Source Solutions for HPC Operations
https://rackslab.io
r DRMMA layer against Slurm 21.08.8 headers and
library?
--
Rémi Palancher
Rackslab: Open Source Solutions for HPC Operations
https://rackslab.io
Weight value and they will be added to the pool of nodes being
> considered for scheduling individually.
[1]
https://github.com/SchedMD/slurm/blob/10b6d5122b77eae417546d5263757d0ed1b2fd31/src/common/read_config.c#L1667
[2] https://slurm.schedmd.com/slurm.conf.html#OPT_Weight
--
Rémi Palancher
Rackslab: Open Source Solutions for HPC Operations
https://rackslab.io
.com/slurm.conf.html#SECTION_NODE-CONFIGURATION
--
Rémi Palancher
Rackslab: Open Source Solutions for HPC Operations
https://rackslab.io
announcement can still be found in the archives of this
mailing-list! [1]
[1] https://groups.google.com/g/slurm-users/c/LiD2Pa8r22A/m/fDHWm5GomJsJ
[2] https://www.edf.fr/en
[3] https://rackslab.io
--
Rémi Palancher
Rackslab: Open Source Solutions for HPC Operations
https://rackslab.io
oc field. There might
not be NSS resolution in the output.
Did the UID of phywht change over time? That would explain why the jobs are
associated to this user in the SlurmDBD database.
--
Rémi Palancher
Rackslab: Open Source Solutions for HPC Operations
https://rackslab.io
ec fields from the cluster
step_table. The total is computed, it is the sum of these fields, as you can
see here:
https://github.com/SchedMD/slurm/blob/fd6fef3e14a0c6d1484230744289749c0e4b19d0/src/plugins/accounting_storage/mysql/as_mysql_jobacct_process.c#L1063
Best,
--
Rémi Palancher
Rackslab: Ope
ere:
https://bugs.schedmd.com/show_bug.cgi?id=3094
Best,
--
Rémi Palancher
Rackslab: Open Source Solutions for HPC Operations
https://rackslab.io
Additionnaly to Michael proposal with the partitions, you could also set up a
QOS for low memory jobs, with a high priority and MaxTRESPerJob.
--
Rémi Palancher
Rackslab: Open Source Solutions for HPC Operations
https://rackslab.io
e UID
of the shell. The second command resolves johndoe UID through nsswitch stack
then looks after the groups of this UID.
Do you have johndoe declared in both local /etc/passwd and LDAP directory with
different UID?
Do `id` and `id johndoe` return the same UID?
--
Rémi Palancher
Rackslab
to handle it gracefully [1].
[1] https://slurm.schedmd.com/high_throughput.html
--
Rémi Palancher
Rackslab: Open Source Solutions for HPC Operations
https://rackslab.io
l reason is the absence of timelimit on the running jobs.
In t his case Slurm is unable to define when the running jobs are over,
when the next highest priority job can start and eventually unable to define
if lower priority jobs actually delay higher priority jobs.
--
Rémi Palancher
Rackslab: Open Source Solutions for HPC Operations
https://rackslab.io
n control over the exact list of
reserved CPUs regarding NUMA topology or whatever.
--
Rémi Palancher
Rackslab: Open Source Solutions for HPC Operations
https://rackslab.io
Hi there,
Le 13/11/2017 à 18:18, Nicholas McCollum a écrit :
Now that there is a slurm-users mailing list, I thought I would share
something with the community that I have been working on to see if anyone else
is interested in it. I have a lot of students on my cluster and I really
wanted a way
19 matches
Mail list logo