Note: I’m aware that I can run Kube on a single node, but we need more
resources. So ultimately we need a way to have Slurm and Kube exist in
the same cluster, both sharing the full amount of resources and both
being fully aware of resource usage.
This is something that we (SchedMD) are working on, although it's a bit
earlier than I was planning to publicly announce anything...
This is a very high-level view, and I have to apologize for stalling a
bit, but: we've hired a team to build out a collection of tools that
we're calling "Slinky" [1]. These provide for canonical ways of running
Slurm within Kubernetes, ways of maintaining and managing the cluster
state, and scheduling integration to allow for compute nodes to be
available to both Kubernetes and Slurm environments while coordinating
their status.
We'll be talking about it in more details at the Slurm User Group
Meeting in Oslo [3], then KubeCon North America in Salt Lake, and SC'24
in Atlanta. We'll have the (open-source, Apache 2.0 licensed) code for
our first development phase available by SC'24 if not sooner.
There's a placeholder documentation page [4] that points to some of the
presentations I've given before talking about approaches to tackling
this converged-computing model, but I'll caution they're a bit dated and
the Slinky-specific presentation we've been working on internally aren't
publicly available yet.
If there are SchedMD support customers that have specific use cases,
please feel free to ping your account managers if you'd like to chat at
some point in the next few months.
- Tim
[1] Slinky is not an acronym (neither is Slurm [2]), but loosely stands
for "Slurm in Kubernetes".
[2] https://slurm.schedmd.com/faq.html#acronym
[3] https://www.schedmd.com/about-schedmd/events/
[4] https://slurm.schedmd.com/slinky.html
--
Tim Wickberg
Chief Technology Officer, SchedMD LLC
Commercial Slurm Development and Support
--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com