Re: [slurm-users] How to deal with user running stuff in frontend node?

Ryan Cox Thu, 15 Feb 2018 13:07:34 -0800

Manuel,

We set up cgroups and also do cputime limits (60 minutes in our case) inlimits.conf. Before libcgroup had support for more generic "apply toeach user" kind of thing, I created a pam module that handles all ofthat which still works well for creating per-user limits. We also havesomething that whitelists various file transfer programs so they aren'tsubject to cputime limits. We include an oom notifier daemon so thatusers are alerted when their cgroup runs out of memory since many peoplewould otherwise have a tough time figuring out the exact cause of the"Killed" message. All of this is available inhttps://github.com/BYUHPC/uft (see the "Recommended Configuration"section in the README.md for "Login Nodes").

We've had this in place for years and pretty much don't even have tothink about this anymore. No complaints either.

If I had a user abusing the system after a warning I would probablyeither kick him off for a cooling off period and/or implement a verystrict cputime limit (10 minutes?) in limits.conf just for him. Just my$0.02.


Ryan

On 02/15/2018 08:11 AM, Manuel Rodríguez Pascual wrote:

Hi all,
Although this is not strictly related to Slurm, maybe you canrecommend me some actions to deal with a particular user.
On our small cluster, currently there are no limits to runapplications in the frontend. This is sometimes really useful for someusers, for example to have scripts monitoring the execution of jobsand taking decisions depending on the partial results.
However, we have this user that keeps abusing this system: when thejob queue is long and there is a significant time wait, he sometimesruns his jobs on the frontend, resulting on a CPU load of 100% andsome delays on using it for the things it is supposed to serve (userlogin, monitoring and so).
Have you faced the same issue? Is there any solution? I am thinkingabout using ulimit to limit the execution time of this jobs in thefrontend to 5 minutes or so. This however does not look so elegant asother users can perform the sabe abuse on the future, and he shouldalso be able to run low cpu-consuming jobs for a longer period.However I am not an experienced sysadmin so I am completely open tosuggestions or different ways of facing this issue.
Any thoughts?

cheers,




Manuel


--
Ryan Cox
Operations Director
Fulton Supercomputing Lab
Brigham Young University

Re: [slurm-users] How to deal with user running stuff in frontend node?

Reply via email to