cpu limit using ulimit is pretty straightforward with pam_limits and
/etc/security/limits.conf. On some of the login nodes we have a cpu limit
of 10 minutes, so heavy processes will fail.
The memory was a bit more complicated (i.e. not pretty). We wanted that a
user won't be able to use more than e.g. 1G for all processes combined.
Using systemd we added the file
/etc/systemd/system/user-.slice.d/20-memory.conf which contains:
[Slice]
MemoryLimit=1024M
MemoryAccounting=true
But we also wanted to restrict swap usage and we're still on cgroupv1, so
systemd didn't help there. The ugly part comes with a pam_exec to a script
that updates the memsw limit of the cgroup for the above slice. The script
does more things, but the swap section is more or less:
if [ "x$PAM_TYPE" = 'xopen_session' ]; then
_id=`id -u $PAM_USER`
if [ -z "$_id" ]; then
exit 1
fi
if [[ -e
/sys/fs/cgroup/memory/user.slice/user-${_id}.slice/memory.memsw.limit_in_bytes
]]; then
swap=$((1126 * 1024 * 1024))
echo $swap >
/sys/fs/cgroup/memory/user.slice/user-${_id}.slice/memory.memsw.limit_in_bytes
fi
fi
On Sun, Oct 31, 2021 at 6:36 PM Brian Andrus wrote:
> That is interesting to me.
>
> How do you use ulimit and systemd to limit user usage on the login nodes?
> This sounds like something very useful.
>
> Brian Andrus
> On 10/31/2021 1:08 AM, Yair Yarom wrote:
>
> Hi,
>
> If it helps, this is our setup:
> 6 clusters (actually a bit more)
> 1 mysql + slurmdbd on the same host
> 6 primary slurmctld on 3 hosts (need to make sure each have a distinct
> SlurmctldPort)
> 6 secondary slurmctld on an arbitrary node on the clusters themselves.
> 1 login node per cluster (this is a very small VM, and the users are
> limited both to cpu time (with ulimit) and memory (with systemd))
> The slurm.conf's are shared on nfs to everyone in /path/to/nfs/ name>/slurm.conf. With symlink to /etc for the relevant cluster per node.
>
> The -M generally works, we can submit/query jobs from a login node of one
> cluster to another. But there's a caveat to notice when upgrading. slurmdbd
> must be upgraded first, but usually we have a not so small gap between
> upgrading the different clusters. This causes the -M to stop working
> because binaries of one version won't work on the other (I don't remember
> in which direction).
> We solved this by using an lmod module per cluster, which both sets the
> SLURM_CONF environment, and the PATH to the correct slurm binaries (which
> we install in /usr/local/slurm// so that they co-exists). So when
> the -M won't work, users can use:
> module load slurm/clusterA
> squeue
> module load slurm/clusterB
> squeue
>
> BR,
>
>
>
>
>
>
>
> On Thu, Oct 28, 2021 at 7:39 PM navin srivastava
> wrote:
>
>> Thank you Tina.
>> It will really help
>>
>> Regards
>> Navin
>>
>> On Thu, Oct 28, 2021, 22:01 Tina Friedrich
>> wrote:
>>
>>> Hello,
>>>
>>> I have the database on a separate server (it runs the database and the
>>> database only). The login nodes run nothing SLURM related, they simply
>>> have the binaries installed & a SLURM config.
>>>
>>> I've never looked into having multiple databases & using
>>> AccountingStorageExternalHost (in fact I'd forgotten you could do that),
>>> so I can't comment on that (maybe someone else can); I think that works,
>>> yes, but as I said never tested that (didn't see much point in running
>>> multiple databases if one would do the job).
>>>
>>> I actually have specific login nodes for both of my clusters, to make it
>>> easier for users (especially those with not much experience using the
>>> HPC environment); so I have one login node connecting to cluster 1 and
>>> one connecting to cluster 1.
>>>
>>> I think the relevant bits of slurm.conf Relevant config entries (if I'm
>>> not mistaken) on the login nodes are probably:
>>>
>>> The differences in the slurm config files (that haven't got to do with
>>> topology & nodes & scheduler tuning) are
>>>
>>> ClusterName=cluster1
>>> ControlMachine=cluster1-slurm
>>> ControlAddr=/IP_OF_SLURM_CONTROLLER/
>>>
>>> ClusterName=cluster2
>>> ControlMachine=cluster2-slurm
>>> ControlAddr=/IP_OF_SLURM_CONTROLLER/
>>>
>>> (where IP_OF_SLURM_CONTROLLER is the IP address of host cluster1-slurm,
>>> same for cluster2)
>>>
>>> And then the have common entries for the AccountingStorageHost:
>>>
>>> AccountingStorageHost=slurm-db-prod
>>> AccountingStorageBackupHost=slurm-db-prod
>>> AccountingStoragePort=7030
>>> AccountingStorageType=accounting_storage/slurmdbd
>>>
>>> (slurm-db-prod is simply the hostname of the SLURM database server)
>>>
>>> Does that help?
>>>
>>> Tina
>>>
>>> On 28/10/2021 14:59, navin srivastava wrote:
>>> > Thank you Tina.
>>> >
>>> > so if i understood correctly.Database is global to both cluster and
>>> > running on login Node?
>>> > or is the database running on one of the master Node and shared with
>>> > another master server Node?
>>> >
>>> > but as far I have read that the slurm database can a