Many thanks Brian and Jeffrey for your ideas,
Yes, at this moment I have all resources listed in the node's definition
line, and just one partition (see below)
Indeed this config would work, with the collaboration of users to not abuse
requesting all existing GPUs for their jobs.
But something that
Hi All
Need some clarification on Fairshare (multifactor priority plugin) and FairTree
Algorithm
If I read correctly, the current default for slurm is FairTree algorithm in
which
1. Priority can set on various level
2. No fairshare-actual usage is being consider
3. Job submitted will
So the node definition is separate from the partition definition.
You would need to define all the GPUs as part of the node. Partitions do
not have physical characteristics, but they do have QOS capabilities
that you may be able to use. You could also use a job_submit lua script
to reject jobs
Hi Community,
I was checking the documentation but could find clear information on what I
am trying to do.
Here at the university we have a large compute node with 3 classes of GPUs.
Lets say the node's hostname is "gpuComputer", it is composed of:
- 4x large GPUs
- 4x medium GPUs (MIG devic