[slurm-users] Strict GrpTRESMins limit

2024-01-17 Thread Kamil Wilczek
Login UsedTRES Name redacted 184311 gres/gpu redacted 1558558 cpu Could someone explain, where could the problem be? Am I missing something? Apparently yes :) Kind regards -- Kamil Wilczek [https://keys.openpgp.org/] [D415917E84B8DA5A60E853B6E676ED

Re: [slurm-users] Multifactor fair-share with single account

2024-01-08 Thread Kamil Wilczek
Thank you all for the help! I created a setup with a single account and multi-factor scheduling with three non-zero weights: job age, job size and fair-share. I'll monitor the fair-share when enough users will register on the cluster. Kind regards, -- Kamil Wilczek [https://keys.openpg

Re: [slurm-users] Multifactor fair-share with single account

2024-01-04 Thread Kamil Wilczek
W dniu 4.01.2024 o 07:56, Loris Bennett pisze: Hi Kamil, Kamil Wilczek writes: Dear All, I have a question regarding the fair-share factor of the multifactor priority algorithm. My current understanding is that the fair-share makes sure that different *accounts* have a fair share of the

[slurm-users] Multifactor fair-share with single account

2024-01-03 Thread Kamil Wilczek
have, say 3 accounts, but I do not wan't to calculate fair-share between accounts, but between all associations from all 3 accounts? In other words, is there a fair-share factor for users/associations instead of accounts? Kind regards -- Kamil Wilczek [https://keys.openpg

[slurm-users] TRES sreport per association

2023-11-12 Thread Kamil Wilczek
or selected users. Each user gets a QoS ("4gpu4d" means that a user can allocate 4 GPUs at most and a single job time limit is 4 days). Each user is also limited to a number of GPUMinutes for each association and it would be nice to know how many minutes are left per assoc. Kind

Re: [slurm-users] How to launch slurm services after installation

2022-11-28 Thread Kamil Wilczek
d each build should have separate config files. This is a bit complicated at first and requires solving several management problems, but after some time I think it allows for easier upgrades. Kind regards -- Kamil Wilczek [https://keys.openpgp.org/] [6C4BE20A90A1DBFB3CBE2947A832BF5A491F9F2A] W dniu 2

Re: [slurm-users] gres.conf and select/cons_res plugin

2022-09-13 Thread Kamil Wilczek
om/gres.conf.html)  I see this: NOTE: Slurm support for gres/[mps|shard] requires the use of the select/cons_tres plugin. On my current (inherited) Slurm cluster we have:   SelectType=select/cons_res but users are primarily using GPU resources, so I know Gres is working. Why then is select/co

Re: [slurm-users] SlurmdSpoolDir

2022-08-16 Thread Kamil Wilczek
of "770" for the parent dir, which in my case is "/opt/slurm_state_dir" drwxrwxr-x 3 slurm slurm 26 Aug 11 19:49 slurm_state_dir Kind regards -- Kamil Wilczek W dniu 16.08.2022 o 18:00, Kamil Wilczek pisze: Dear Slurm Users, recently, I have started a new instance of my

[slurm-users] SlurmdSpoolDir

2022-08-16 Thread Kamil Wilczek
he correct settings should be? I did not have such problems in using 19.05. Kind Regards -- Kamil Wilczek

[slurm-users] Using "srun" on compute nodes -- Ray cluster

2022-07-15 Thread Kamil Wilczek
partition" limits)? Kind Regards -- Kamil Wilczek [https://keys.openpgp.org/] [D415917E84B8DA5A60E853B6E676ED061316B69B] OpenPGP_signature Description: OpenPGP digital signature

Re: [slurm-users] 答复: how do slurmctld determine whether a compute node is not responding?

2022-07-11 Thread Kamil Wilczek
ion, especially if some resources are not reserved for the OS. -- W dniu 11.07.2022 o 10:27, taleinterve...@sjtu.edu.cn pisze: Hello, Kamil Wilczek: Well I agree that the non-responding case may caused by network unstable, since our slurm cluster has 2 part nodes geographical distant distri

Re: [slurm-users] how do slurmctld determine whether a compute node is not responding?

2022-07-11 Thread Kamil Wilczek
t action do slurmctld launched? How did it determine whether a node is responsive or non-responsive? And is it possible to customize slurmctld’s behavior on such detection, for example wait timeout or retry count before determine the node to be not responding? -- Kamil Wilc

Re: [slurm-users] Heterogeneous GPU Node?

2022-06-23 Thread Kamil Wilczek
Name=gpu Type=titanx File=/dev/nvidia6 Name=gpu Type=titanx File=/dev/nvidia7 -- W dniu 23.06.2022 o 22:40, Kamil Wilczek pisze: Hello, we have both homogeneous and heterogeneous GPU servers and all of them work without problems. We have mixed GTX 1080 Ti, Titan V and Titan X, but not the more

Re: [slurm-users] Heterogeneous GPU Node?

2022-06-23 Thread Kamil Wilczek
ason L. Simms, Ph.D., M.P.H.* Manager of Research and High-Performance Computing XSEDE Campus Champion Lafayette College Information Technology Services 710 Sullivan Rd | Easton, PA 18042 Office: 112 Skillman Library p: (610) 330-5632 -- Kamil Wilczek [https://keys.openpgp.org/] [D415917E84B8DA5A60E85

Re: [slurm-users] sreport time units explanation

2022-06-22 Thread Kamil Wilczek
Hmm, just by looking at those values it seems that this is simply the number reported by "sreport", divided by the number of hours in the specified period, multiplied by the number of GPUs. Something like GPUHours. -- W dniu 22.06.2022 o 10:24, Kamil Wilczek pisze: Yes, it is po

Re: [slurm-users] sreport time units explanation

2022-06-22 Thread Kamil Wilczek
.41%). This seems reasonable to me. As there are 513 hours in the period, your user would have had to have used around 15 cards fairly continuously. Is that not possible? Cheers, Loris How should this value be interpreted? Kind Regards -- Kamil Wilczek [https://keys.openpgp.org/] [

[slurm-users] sreport time units explanation

2022-06-22 Thread Kamil Wilczek
me Used --- --- - redacted redacted gres/gpu 7470(23.11%) The number "7470" is obviously not a number of raw hours used by a user. How should this value be interpreted? Kind Regards -- Kamil Wilczek [https://keys.openpgp.org/] [D415917E84B8DA5A60E853B6E676ED0

Re: [slurm-users] Configuring SLURM on single node GPU cluster

2022-04-06 Thread Kamil Wilczek
ocalhost Name=gpu File=/dev/nvidia7 Best, Sushil -- Kamil Wilczek [https://keys.openpgp.org/] [D415917E84B8DA5A60E853B6E676ED061316B69B] Laboratorium Komputerowe Wydział Matematyki, Informatyki i Mechaniki Uniwersytet Warszawski ul. Banacha 2 02-097 Warszawa Tel.: 22 55 44 392 https://www.

Re: [slurm-users] Sharing a GPU

2022-04-05 Thread Kamil Wilczek
Thank you all for the help! The plugin seems to be thing I'm looking for. I'll try to test it with a spare server/GPUs. Thank again! -- Kamil Wilczek W dniu 04.04.2022 o 09:20, Bas van der Vlies pisze: We have the exact same request for our GPUS that are not A100 and we have devel

[slurm-users] Sharing a GPU

2022-04-03 Thread Kamil Wilczek
gards -- Kamil Wilczek [https://keys.openpgp.org/] [D415917E84B8DA5A60E853B6E676ED061316B69B] OpenPGP_signature Description: OpenPGP digital signature