Hi Christian,
On 2025-05-12 10:54, Christian Kastner wrote:
[3]: Sadly, after much trying, it seems that the analog for [1] in
rootless containers, using the 'podman+rocm' backend, is not
possible due to come cgroupsv2 restriction. However, I still
have the code for that, and I guess I could ship it for people who
want to try it in rootful containers.
I take it you are referring to setting environment variables in podman
workers? The ROCR_VISIBLE_DEVICES variable can isolate the GPU at a
fairly low level in the ROCm user land [2]. Or, do you mean only passing
through a subset of devices at all? I forget how you were approaching this.
In any case, isolation via rooted containers would probably be useful as
an option. I'd like to limit Pinwheel and Arctophylax to a single GPU
[3]. They're getting a fair bit of interactive use now and that would
make it easier to share them. It's up to you, though.
Sincerely,
Cory Bloor
[2]: https://rocm.docs.amd.com/en/docs-6.4.0/conceptual/gpu-isolation.html
[3]:
https://salsa.debian.org/rocm-team/community/team-project/-/wikis/Continuous-integration-workers