Hi,
I have been working on identifying if there is a way to restrict disk usage
with slurm using cgroup.
I am aware of the TmpFS mechanism but it seems it just used to define the
required space but does not restrict if it grows to a certain extent of a
file.
Any suggestion would be appreciated.
Hello Everyone,
We would like to improve our visibility on our cluster usage.
We have ganglia, and use sacct actually, but I was wondering if there was a web
tool recommended to have both monitoring and accounting (user and admin
friendly) ?
Thanks in advance
Christine
At a place I worked before, we used XDMOD several years ago. It was a bit
tricky to set up correctly and not exactly intuitive to get started with
data collection as a user (managers, allocation specialists and
other not-super-technical people were most of our users). But when
familiarized with it,
Something I have been impressed with is Netdata
It is in the standard repositories and will auto-detect quite a bit of
things on a node. It is great for real-time monitoring of a node/job.
I also use Prometheus and Grafana for historic data (anything over 5
minutes).
Brian Andrus
On 5/5/20
Hi All,
Quick question. Is there a way to limit the runtime on a partition only for
salloc ? I would like for batch jobs to have a default max runtime of the
partition but interactive jobs to have shortened allowed runtime.
Thanks!