Hi everyone,
I need some advice on properly configuring my HPC cluster, which consists of 
three compute nodes (each with 256 cores and 800GB of RAM).

The end users belong to three distinct groups: internal, external, and 
collaborator.

I would like the internal users to have full (and exclusive) access to the 
first two compute nodes. That is, by default, if they submit a job, it should 
be scheduled on Node1 or Node2, depending on availability (or placed in the 
queue if necessary).
Internal users should also have access to 100 cores and 200GB of RAM on Node3, 
with this limit applying as a hard cap across all their jobs.

The remaining 156 cores and 600GB of RAM on Node3 should be exclusively 
available to external and collaborator users.
Specifically, I would like the following setup:

    - The resources on Node3 should be split as follows:
        - 100 cores and 400GB of RAM for external users.
        - 56 cores and 200GB of RAM for collaborator users.

    - External users should be able to use the resources dedicated to 
collaborator users but with lowest priority. This means an external user can 
submit a job requesting the resources reserved for collaborators, but the job 
will only start if no collaborator jobs are in the queue. Any job submitted by 
a collaborator—even after an external user's job—should take priority. The same 
rule should apply in reverse for collaborators trying to use external resources.

    - There should be a group/partition/queue that includes all the remaining 
156 cores and 600GB of RAM on Node3. Both external and collaborator users 
should be able to submit jobs to this queue, but a job submitted here should 
only start if there are no pending jobs in the two dedicated partitions.


What is the best way to implement such an architecture?
Thanks, everyone, for your support!

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

Reply via email to