Login UsedTRES Name
redacted 184311 gres/gpu
redacted 1558558 cpu
Could someone explain, where could the problem be? Am I missing
something? Apparently yes :)
Kind regards
--
Kamil Wilczek [https://keys.openpgp.org/]
[D415917E84B8DA5A60E853B6E676ED
Thank you all for the help! I created a setup with a single account
and multi-factor scheduling with three non-zero weights: job age, job
size and fair-share. I'll monitor the fair-share when enough users
will register on the cluster.
Kind regards,
--
Kamil Wilczek [https://keys.openpg
W dniu 4.01.2024 o 07:56, Loris Bennett pisze:
Hi Kamil,
Kamil Wilczek writes:
Dear All,
I have a question regarding the fair-share factor of the multifactor
priority algorithm. My current understanding is that the fair-share
makes sure that different *accounts* have a fair share of the
have, say 3 accounts, but I do not wan't to calculate
fair-share between accounts, but between all associations from all
3 accounts? In other words, is there a fair-share factor for
users/associations instead of accounts?
Kind regards
--
Kamil Wilczek [https://keys.openpg
or selected users.
Each user gets a QoS ("4gpu4d" means that a user can allocate
4 GPUs at most and a single job time limit is 4 days). Each
user is also limited to a number of GPUMinutes for each
association and it would be nice to know how many minutes
are left per assoc.
Kind
d each build should have separate config files.
This is a bit complicated at first and requires solving several
management problems, but after some time I think it allows for easier
upgrades.
Kind regards
--
Kamil Wilczek [https://keys.openpgp.org/]
[6C4BE20A90A1DBFB3CBE2947A832BF5A491F9F2A]
W dniu 2
om/gres.conf.html) I see this:
NOTE: Slurm support for gres/[mps|shard] requires the use of the
select/cons_tres plugin.
On my current (inherited) Slurm cluster we have:
SelectType=select/cons_res
but users are primarily using GPU resources, so I know Gres is working.
Why then is select/co
of "770" for the parent dir, which in my case is
"/opt/slurm_state_dir"
drwxrwxr-x 3 slurm slurm 26 Aug 11 19:49 slurm_state_dir
Kind regards
--
Kamil Wilczek
W dniu 16.08.2022 o 18:00, Kamil Wilczek pisze:
Dear Slurm Users,
recently, I have started a new instance of my
he correct settings should be?
I did not have such problems in using 19.05.
Kind Regards
--
Kamil Wilczek
partition" limits)?
Kind Regards
--
Kamil Wilczek [https://keys.openpgp.org/]
[D415917E84B8DA5A60E853B6E676ED061316B69B]
OpenPGP_signature
Description: OpenPGP digital signature
ion, especially if some resources
are not reserved for the OS.
--
W dniu 11.07.2022 o 10:27, taleinterve...@sjtu.edu.cn pisze:
Hello, Kamil Wilczek:
Well I agree that the non-responding case may caused by network unstable, since
our slurm cluster has 2 part nodes geographical distant distri
t action do slurmctld launched? How did
it determine whether a node is responsive or non-responsive?
And is it possible to customize slurmctld’s behavior on such detection,
for example wait timeout or retry count before determine the node to be
not responding?
--
Kamil Wilc
Name=gpu Type=titanx File=/dev/nvidia6
Name=gpu Type=titanx File=/dev/nvidia7
--
W dniu 23.06.2022 o 22:40, Kamil Wilczek pisze:
Hello,
we have both homogeneous and heterogeneous GPU servers and all of them
work without problems. We have mixed GTX 1080 Ti, Titan V and Titan X,
but not the more
ason L. Simms, Ph.D., M.P.H.*
Manager of Research and High-Performance Computing
XSEDE Campus Champion
Lafayette College
Information Technology Services
710 Sullivan Rd | Easton, PA 18042
Office: 112 Skillman Library
p: (610) 330-5632
--
Kamil Wilczek [https://keys.openpgp.org/]
[D415917E84B8DA5A60E85
Hmm, just by looking at those values it seems that this is simply
the number reported by "sreport", divided by the number of hours
in the specified period, multiplied by the number of GPUs.
Something like GPUHours.
--
W dniu 22.06.2022 o 10:24, Kamil Wilczek pisze:
Yes, it is po
.41%). This seems reasonable to me.
As there are 513 hours in the period, your user would have had to have
used around 15 cards fairly continuously. Is that not possible?
Cheers,
Loris
How should this value be interpreted?
Kind Regards
--
Kamil Wilczek [https://keys.openpgp.org/]
[
me Used
--- --- -
redacted redacted gres/gpu 7470(23.11%)
The number "7470" is obviously not a number of raw hours used by a user.
How should this value be interpreted?
Kind Regards
--
Kamil Wilczek [https://keys.openpgp.org/]
[D415917E84B8DA5A60E853B6E676ED0
ocalhost Name=gpu File=/dev/nvidia7
Best,
Sushil
--
Kamil Wilczek [https://keys.openpgp.org/]
[D415917E84B8DA5A60E853B6E676ED061316B69B]
Laboratorium Komputerowe
Wydział Matematyki, Informatyki i Mechaniki
Uniwersytet Warszawski
ul. Banacha 2
02-097 Warszawa
Tel.: 22 55 44 392
https://www.
Thank you all for the help!
The plugin seems to be thing I'm looking for.
I'll try to test it with a spare server/GPUs.
Thank again!
--
Kamil Wilczek
W dniu 04.04.2022 o 09:20, Bas van der Vlies pisze:
We have the exact same request for our GPUS that are not A100 and we
have devel
gards
--
Kamil Wilczek [https://keys.openpgp.org/]
[D415917E84B8DA5A60E853B6E676ED061316B69B]
OpenPGP_signature
Description: OpenPGP digital signature
20 matches
Mail list logo