t;>
/sys/fs/cgroup/system.slice/cgroup.subtree_control
/usr/sbin/slurmstepd infinity &
*From:*Josef Dvoracek via slurm-users
*Sent:* Thursday, April 11, 2024 11:14 AM
*To:* slurm-users@lists.schedmd.com
*Subject:* [slurm-users] Re: Slurmd enabled crash with CgroupV2
I observe same behavior on slurm 23
I observe same behavior on slurm 23.11.5 Rocky Linux8.9..
> [root@compute ~]# cat /sys/fs/cgroup/cgroup.subtree_control
> memory pids
> [root@compute ~]# systemctl disable slurmd
> Removed /etc/systemd/system/multi-user.target.wants/slurmd.service.
> [root@compute ~]# cat /sys/fs/cgroup/cgroup.su
Is here anybody having nice visualization of JobComp and JobacctGather
data in Grafana?
I save JobComp data in Elasticsearch, JobacctGather data in influxDB,
and thinking about how to provide meaningful insights to $users.
Things I'd like to show..: especially memory & cpu utilization, job
r
I use telegraf (which supports "exporter" output format as well) to
capture cgroupsv2 job data:
https://github.com/jose-d/telegraf-configs/tree/master/slurm-cgroupsv2
I had to rework it when changing from cgroupsv1 to cgroupsv2, as the
format/structure of textfiles changed a bit.
cheers
jos
I think you need set reasonable "DefMemPerCPU" - otherwise jobs will
take all memory by default, and there is no remaining memory for the
second job.
We calculated DefMemPerCPU in such way, that the default allocated
memory of full node is slightly under half of total node memory. So
there i
> I'm running slurm 22.05.11 which is available with OpenHCP 3.x
> Do you think an upgrade is needed?
I feel that lot of slurm operators tend to not use 3rd party sources of
slurm binaries, as you do not have the build environment fully in your
hands.
But before making such a complex decision
I think installing/upgrading "slurm" rpm will replace this shared lib.
Indeed, as always, test it first at not-so-critical system, use vm
snapshots to be able to travel back in time ... as once you'll upgrade
DB schema (if part of upgrade) you AFAIK can not go back.
josef
On 28. 02. 24 15:51
I see this question unanswered so far.. so I'll give you my 2 cents:
Quick check reveals that mentioned symbol is in libslurmfull.so :
[root@slurmserver2 ~]# nm -gD /usr/lib64/slurm/libslurmfull.so | grep
"slurm_conf$"
000d2c06 T free_slurm_conf
000d3345 T init_slurm_conf
0
Hi Dietmar;
I tried this on ${my cluster}, as I switched to cgroupsv2 quite recently..
I must say that on my setup it looks it works as expected, see the
grepped stdout from your reproducer below.
I use recent slurm 23.11.4 .
Wild guess.. Has your build machine bpt and dbus devel packages in
From unclear reason "--wrap" was not part of my /repertoire/ so far.
thanks
On 26. 02. 24 9:47, Ward Poelmans via slurm-users wrote:
sbatch --wrap 'screen -D -m'
srun --jobid --pty screen -rd
smime.p7s
Description: S/MIME Cryptographic Signature
--
slurm-users mailing list -- slurm-users@li
What is the recommended way to run longer interactive job at your systems?
Our how-to includes starting screen at front-end node and running srun
with bash/zsh inside,
but that indeed brings dependency between login node (with screen) and
the compute node job.
On systems with multiple front-e
> Just looking for some feedback, please. Is this OK? Is there a better
way?
> I’m tempted to spec all new HPCs with only a high speed (200Gbps) IB
network,
Well you need Ethernet for OOB management (bmc/ipmi/ilo/whatever)
anyway.. or?
cheers
josef
On 25. 02. 24 21:12, Dan Healy via slur
isn't your /softs.. filesystem eg. some cluster network filesystem mount?
It happened to me multiple times, that I was attempting to build some
scientific software, and because of building on top of BeeGFS (I think
hardlinks are not fully supported), or NFS ( caching), I was getting
_interesti
My impression is, that there are multiple challenges why it's not easy
to create good-for-all recent slurm RPM:
- NVML dependency - different sites use different NVML lib version with
varying update cycle
- pmi* deps - some sites (like mine) is using only one reasonable recent
openpmix, I know
14 matches
Mail list logo