Re: [slurm-users] How to trap a SIGINT signal in a child process of a batch ?

2020-04-21 Thread Jean-mathieu CHANTREIN
- Mail original - > De: "b h mevik" > À: "slurm-users" > Envoyé: Mardi 21 Avril 2020 10:29:32 > Objet: Re: [slurm-users] How to trap a SIGINT signal in a child process of a > batch ? > Jean-mathieu CHANTREIN writes: > >> test.s

[slurm-users] How to trap a SIGINT signal in a child process of a batch ?

2020-04-21 Thread Jean-mathieu CHANTREIN
Hello, I'm using slurm version 19.05.2 on debian 10. I'm try to hand a SIGINT signal by a child process of a batch. The signal is automatically send 30 s before the end of time. You can see this mechanism in this minimal example: --- test.slurm: #!/bin

Re: [slurm-users] How to automatically kill a job that exceeds its memory limits (--mem-per-cpu)?

2019-10-09 Thread Jean-mathieu CHANTREIN
- Mail original - > Maybe I missed something else... That's right. Thank to Bjørn-Helge who help me. You must enable swapaccount in the kernel as shown here: https://unix.stackexchange.com/questions/531480/what-does-swapaccount-1-in-grub-cmdline-linux-default-do By default, this is

Re: [slurm-users] How to automatically kill a job that exceeds its memory limits (--mem-per-cpu)?

2019-10-08 Thread Jean-mathieu CHANTREIN
Hello, thanks for you answers, > - Does it work if you remove the space in "TaskPlugin=task/affinity, > task/cgroup"? (Slurm can be quite picky when reading slurm.conf). It was the case, I make a mistake when I copy/cut... So, I haven't space here. > > - See in slurmd.log on the node(s) of the

[slurm-users] How to automatically kill a job that exceeds its memory limits (--mem-per-cpu)?

2019-10-07 Thread Jean-mathieu CHANTREIN
Hello, I tried using, in slurm.conf TaskPlugin=task/affinity, task/cgroup SelectTypeParameters=CR_CPU_Memory MemLimitEnforce=yes and in cgroup.conf: CgroupAutomount=yes ConstrainCores=yes ConstrainRAMSpace=yes ConstrainSwapSpace=yes MaxSwapPercent=10 TaskAffinity=no But when the job

[slurm-users] How to preempt job with priority_multifactor parameter ?

2019-06-04 Thread Jean-mathieu CHANTREIN
Hello, Is there a way to preempt jobs by using the priority of a job calculate with priority_multifactor and not with a priority related to partition or qos ? I tried using PreemptType=preempt/partition_prio in slurm.conf but it doesn't work and I haven't seen this type of use case in the doc

Re: [slurm-users] How should I do so that jobs are allocated to the thread and not to the core ?

2019-05-06 Thread Jean-mathieu CHANTREIN
o that jobs are > allocated > to the thread and not to the core ? > Have you Seen the slurm FAQ? > You may want to search on that site for "Hyperthreading" > (Sorry for the TOFU. vacation, mobile) > Am 30. April 2019 18:07:03 MESZ schrieb Jean-mathieu CHANTREIN <

[slurm-users] How should I do so that jobs are allocated to the thread and not to the core ?

2019-04-30 Thread Jean-mathieu CHANTREIN
Hello, Most jobs of my users are single-thread. I have multithreaded processors. The jobs seem to reserve 2 logical CPU (1 core=2 CPU (2 threads)) whereas it only uses 1 logical CPU(1 thread). Nevertheless, my slurm.conf file indicates: [...] SelectType = select / cons_res SelectTypeParamet

Re: [slurm-users] How to get a summary of the use of compute nodes and/or partition of a cluster in real time ?

2019-04-30 Thread Jean-mathieu CHANTREIN
ginal - > De: "Ole Holm Nielsen" > À: "Slurm User Community List" > Envoyé: Mardi 30 Avril 2019 15:15:33 > Objet: Re: [slurm-users] How to get a summary of the use of compute nodes > and/or partition of a cluster in real time ? > Hi Jean-Mathieu, >

[slurm-users] How to get a summary of the use of compute nodes and/or partition of a cluster in real time ?

2019-04-30 Thread Jean-mathieu CHANTREIN
Hello, Do you know a command to get a summary of the use of compute nodes and/or partition of a cluster in real time ? Something with a output like this: $ sutilization Partition/Node_Name CPU_Use CPU_Total %Use standard 236 564 41,8 % n001 36 72 50 % n002 18 72 25 % ... Reg

Re: [slurm-users] Array job execution trouble: some jobs in the array fail

2019-01-11 Thread Jean-mathieu CHANTREIN
[ http://r00n56.localdomain.hpc.udel.edu/ | r00n56.localdomain.hpc.udel.edu >>> ] >>> [frey@login00 ~]$ ulimit -u >>> 4096 >>> [frey@login00 ~]$ exit >>> : >>> [frey@login00 ~]$ ulimit -u 24 >>> [frey@login00 ~]$ srun ... --propagate=ALL /bin/ba

Re: [slurm-users] Array job execution trouble: some jobs in the array fail

2019-01-11 Thread Jean-mathieu CHANTREIN
> [frey@login00 ~]$ ulimit -u >> 4096 >> [frey@login00 ~]$ exit >> : >> [frey@login00 ~]$ ulimit -u 24 >> [frey@login00 ~]$ srun ... --propagate=ALL /bin/bash >> [frey@login00 ~]$ hostname >> [ http://r00n49.localdomain.hpc.udel.edu/ | r00n49.loca

[slurm-users] Array job execution trouble: some jobs in the array fail

2019-01-11 Thread Jean-mathieu CHANTREIN
bs at the same time (--array=1-100%10), all jobs succeed. But if I force slurm to execute only 30 jobs at the same time (--array=1-100%30), I have a part that fails again. Has anyone ever faced this type of problem? If so, please kindly enlighten me. Regards Jean-Mathieu Chantrein In char