[slurm-users] Re: FreeBSD/aarch64: ld: error: unknown emulation: elf_aarch64
Hello, I instructed port to use binutils from ports (version 2.40 native) instead of base: `/usr/local/bin/ld: unrecognised emulation mode: elf_aarch64` ``` /usr/local/bin/ld -V |grep aarch64 aarch64cloudabi aarch64cloudabib aarch64elf aarch64elf32 aarch64elf32b aarch64elfb aarch64fbsd aarch64fbsdb aarch64haiku aarch64linux aarch64linux32 aarch64linux32b aarch64linuxb aarch64pe ``` Any clues about "elf_aarch64" and "aarch64elf" mismatch? Thanks, Christopher Samuel via slurm-users escreveu (sábado, 4/05/2024 à(s) 20:27): > On 5/4/24 4:24 am, Nuno Teixeira via slurm-users wrote: > > > Any clues? > > > > > ld: error: unknown emulation: elf_aarch64 > > All I can think is that your ld doesn't like elf_aarch64, from the log > your posting it looks that's being injected from the FreeBSD ports > system. Looking at the man page for ld on Linux it says: > >-m emulation > Emulate the emulation linker. You can list the available > emulations with the --verbose or -V options. > > So I'd guess you'd need to look at what that version of ld supports and > then update the ports system to match. > > Good luck! > > All the best, > Chris > -- > Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA > > > -- > slurm-users mailing list -- slurm-users@lists.schedmd.com > To unsubscribe send an email to slurm-users-le...@lists.schedmd.com > -- Nuno Teixeira FreeBSD UNIX: Web: https://FreeBSD.org -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
[slurm-users] Re: Convergence of Kube and Slurm?
There is a kubeflow offering that might be of interest: https://www.dkube.io/post/mlops-on-hpc-slurm-with-kubeflow I have not tried it myself, no idea how well it works. Regards, --Dani_L. On 05/05/2024 0:05, Dan Healy via slurm-users wrote: Bright Cluster Manager has some verbiage on their marketing site that they can manage a cluster running both Kubernetes and Slurm. Maybe I misunderstood it. But nevertheless, I am encountering groups more frequently that want to run a stack of containers that need private container networking. What’s the current state of using the same HPC cluster for both Slurm and Kube? Note: I’m aware that I can run Kube on a single node, but we need more resources. So ultimately we need a way to have Slurm and Kube exist in the same cluster, both sharing the full amount of resources and both being fully aware of resource usage. Thanks, Daniel Healy -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
[slurm-users] Invalid/incorrect gres.conf syntax
Hello, I have configured my "gres.conf" in this way: NodeName=node-gpu-1 AutoDetect=off Name=gpu Type=GeForceRTX2070 File=/dev/nvidia0 Cores=0-11 NodeName=node-gpu-1 AutoDetect=off Name=gpu Type=GeForceGTX1080Ti File=/dev/nvidia1 Cores=12-23 NodeName=node-gpu-2 AutoDetect=off Name=gpu Type=GeForceGTX1080Ti File=/dev/nvidia0 Cores=0-11 NodeName=node-gpu-2 AutoDetect=off Name=gpu Type=GeForceGTX1080 File=/dev/nvidia1 Cores=12-23 NodeName=node-gpu-3 AutoDetect=off Name=gpu Type=GeForceRTX3080 File=/dev/nvidia0 Cores=0-11 NodeName=node-gpu-4 AutoDetect=off Name=gpu Type=GeForceRTX3080 File=/dev/nvidia0 Cores=0-7 node-gpu-1 and node-gpu-2 are two systems with two sockets; node-gpu-3 and node-gpu-4 have only one socket. In my "slurm.conf" I have these lines: AccountingStorageTRES=gres/gpu SelectType=select/cons_tres GresTypes=gpu NodeName=node-gpu-1 CPUs=24 SocketsPerBoard=2 CoresPerSocket=6 ThreadsPerCore=2 RealMemory=96000 TmpDisk=47000 Gres=gpu:GeForceRTX2070:1,gpu:GeForceGTX1080Ti:1 NodeName=node-gpu-2 CPUs=24 SocketsPerBoard=2 CoresPerSocket=6 ThreadsPerCore=2 RealMemory=96000 TmpDisk=47000 Gres=gpu:GeForceGTX1080Ti:1,gpu:GeForceGTX1080:1 NodeName=node-gpu-3 CPUs=12 SocketsPerBoard=1 CoresPerSocket=6 ThreadsPerCore=2 RealMemory=23000 Gres=gpu:GeForceRTX3080:1 NodeName=node-gpu-4 CPUs=8 SocketsPerBoard=1 CoresPerSocket=4 ThreadsPerCore=2 RealMemory=7800 Gres=gpu:GeForceRTX3080:1 Thanks a lot! -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
[slurm-users] Re: FreeBSD/aarch64: ld: error: unknown emulation: elf_aarch64
On Mon, 2024-05-06 at 11:38:30 +0100, Slurm users wrote: > Hello, > > I instructed port to use binutils from ports (version 2.40 native) instead > of base: > > `/usr/local/bin/ld: unrecognised emulation mode: elf_aarch64` > > ``` > /usr/local/bin/ld -V |grep aarch64 >aarch64cloudabi >aarch64cloudabib >aarch64elf >aarch64elf32 >aarch64elf32b >aarch64elfb >aarch64fbsd >aarch64fbsdb >aarch64haiku >aarch64linux >aarch64linux32 >aarch64linux32b >aarch64linuxb >aarch64pe > ``` > > Any clues about "elf_aarch64" and "aarch64elf" mismatch? This looks (I admit, I haven't UTSL) like the emulation mode is constructed from an "elf_" prefix and the architecture nickname - this works for "x86_64" and "i386" since the "ld" for the Intel/AMD architectures indeed provides the emulations "elf_x86_64" and "elf_i386" while for 64-bit ARM "elf" is used as a suffix. So this is mainly an ld inconsistency, I'm afraid (which might be fixed by adding alias names - but the hopes are pretty low). Non-emulated builds shouldn't be affected by the issue you found, right? (There is Slurm built for ARM64 Debian. Maybe they have patched the source?) Two ways to get this fixed I can imagine: (a) find the place where the emulation mode name is combined, and teach that of possible exceptions to the implemented rule (there may be more than just ARM - what about RISC-V, PPC64*, ...?) (b) interrupt the build in a reasonable place, find all occurreences of the wrong emulation string, and replace it with its existing counterpart There should be no doubt which one I'd prefer - I'll go and read TS ;) Cheers, Steffen -- Steffen Grunewald, Cluster Administrator Max Planck Institute for Gravitational Physics (Albert Einstein Institute) Am Mühlenberg 1 * D-14476 Potsdam-Golm * Germany ~~~ Fon: +49-331-567 7274 Mail: steffen.grunewald(at)aei.mpg.de ~~~ -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
[slurm-users] Rootless Docker Errors with Slurm
I am trying to integrate Rootless Docker with Slurm. have set-up Rootless Docker as per the docs "https://slurm.schedmd.com/containers.html"; . I have scrum.lua, oci.conf (for crun) and slurm.conf in place. Then "~/.config/docker/daemon.json" and "~/.config/systemd/user/docker.service.d/override.conf" are in place too. But I can't seem to get it to work: $ docker run $DOCKER_SECURITY alpine /bin/printenv SLURM_JOB_ID Unable to find image 'alpine:latest' locally latest: Pulling from library/alpine 4abcf2066143: Pull complete Digest: sha256:c5b1261d6d3e43071626931fc004f70149baeba2c8ec672bd4f27761f8e1ad6b Status: Downloaded newer image for alpine:latest docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: unable to retrieve OCI runtime error (open /run/user/1000/docker-exec/containerd/daemon/io.containerd.runtime.v2.task/moby/97e8dd767977ac03ab7af54c015c0fd5dfd26e737771b977acb7e41f799023aa/log.json: no such file or directory): /usr/bin/scrun did not terminate successfully: exit status 1: unknown. One thing is if I don't use Slurm as the runtime for Docker (if I remove the " ~/.config/docker/daemon.json") then docker runs fine -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
[slurm-users] Re: FreeBSD/aarch64: ld: error: unknown emulation: elf_aarch64
On 5/6/24 6:38 am, Nuno Teixeira via slurm-users wrote: Any clues about "elf_aarch64" and "aarch64elf" mismatch? As I mentioned I think this is coming from the FreeBSD patching that's being done to the upstream Slurm sources, specifically it looks like elf_aarch64 is being injected here: /usr/bin/sed -i.bak -e 's|"/proc|"/compat/linux/proc|g' -e 's|(/proc)|(/compat/linux/proc)|g' /wrkdirs/usr/ports/sysutils/slurm-wlm/work/slurm-23.11.6/src/slurmd/slurmstepd/req.c /usr/bin/find /wrkdirs/usr/ports/sysutils/slurm-wlm/work/slurm-23.11.6/src/api /wrkdirs/usr/ports/sysutils/slurm-wlm/work/slurm-23.11.6/src/plugins/openapi /wrkdirs/usr/ports/sysutils/slurm-wlm/work/slurm-23.11.6/src/sacctmgr /wrkdirs/usr/ports/sysutils/slurm-wlm/work/slurm-23.11.6/src/sackd /wrkdirs/usr/ports/sysutils/slurm-wlm/work/slurm-23.11.6/src/scontrol /wrkdirs/usr/ports/sysutils/slurm-wlm/work/slurm-23.11.6/src/scrontab /wrkdirs/usr/ports/sysutils/slurm-wlm/work/slurm-23.11.6/src/scrun /wrkdirs/usr/ports/sysutils/slurm-wlm/work/slurm-23.11.6/src/slurmctld /wrkdirs/usr/ports/sysutils/slurm-wlm/work/slurm-23.11.6/src/slurmd/slurmd /wrkdirs/usr/ports/sysutils/slurm-wlm/work/slurm-23.11.6/src/squeue -name Makefile.in | /usr/bin/xargs /usr/bin/sed -i.bak -e 's|-r -o|-r -m elf_aarch64 -o|' So I guess that will need to be fixed to match what FreeBSD supports. I don't think this is a Slurm issue from what I see there. All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
[slurm-users] Re: FreeBSD/aarch64: ld: error: unknown emulation: elf_aarch64
Hello, I too think this the cause and I really missed it: .if ${ARCH} == powerpc64le ${FIND} ${LLD2FIX:C|^|${WRKSRC}/src/|} -name Makefile.in | ${XARGS} \ ${REINPLACE_CMD} -e 's|-r -o|-r -m elf64lppc -o|' .elif ${ARCH} == powerpc64 ${FIND} ${LLD2FIX:C|^|${WRKSRC}/src/|} -name Makefile.in | ${XARGS} \ ${REINPLACE_CMD} -e 's|-r -o|-r -m elf64ppc -o|' .else ${FIND} ${LLD2FIX:C|^|${WRKSRC}/src/|} -name Makefile.in | ${XARGS} \ ${REINPLACE_CMD} -e 's|-r -o|-r -m elf_${ARCH} -o|' I will adjust it and see build result. Thanks, Christopher Samuel via slurm-users escreveu (segunda, 6/05/2024 à(s) 14:35): > On 5/6/24 6:38 am, Nuno Teixeira via slurm-users wrote: > > > Any clues about "elf_aarch64" and "aarch64elf" mismatch? > > As I mentioned I think this is coming from the FreeBSD patching that's > being done to the upstream Slurm sources, specifically it looks like > elf_aarch64 is being injected here: > > /usr/bin/sed -i.bak -e 's|"/proc|"/compat/linux/proc|g' -e > 's|(/proc)|(/compat/linux/proc)|g' > > /wrkdirs/usr/ports/sysutils/slurm-wlm/work/slurm-23.11.6/src/slurmd/slurmstepd/req.c > /usr/bin/find > /wrkdirs/usr/ports/sysutils/slurm-wlm/work/slurm-23.11.6/src/api > /wrkdirs/usr/ports/sysutils/slurm-wlm/work/slurm-23.11.6/src/plugins/openapi > > /wrkdirs/usr/ports/sysutils/slurm-wlm/work/slurm-23.11.6/src/sacctmgr > /wrkdirs/usr/ports/sysutils/slurm-wlm/work/slurm-23.11.6/src/sackd > /wrkdirs/usr/ports/sysutils/slurm-wlm/work/slurm-23.11.6/src/scontrol > /wrkdirs/usr/ports/sysutils/slurm-wlm/work/slurm-23.11.6/src/scrontab > /wrkdirs/usr/ports/sysutils/slurm-wlm/work/slurm-23.11.6/src/scrun > /wrkdirs/usr/ports/sysutils/slurm-wlm/work/slurm-23.11.6/src/slurmctld > /wrkdirs/usr/ports/sysutils/slurm-wlm/work/slurm-23.11.6/src/slurmd/slurmd > /wrkdirs/usr/ports/sysutils/slurm-wlm/work/slurm-23.11.6/src/squeue > -name Makefile.in | /usr/bin/xargs /usr/bin/sed -i.bak -e 's|-r > -o|-r > -m elf_aarch64 -o|' > > So I guess that will need to be fixed to match what FreeBSD supports. > > I don't think this is a Slurm issue from what I see there. > > All the best, > Chris > -- > Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA > > > -- > slurm-users mailing list -- slurm-users@lists.schedmd.com > To unsubscribe send an email to slurm-users-le...@lists.schedmd.com > -- Nuno Teixeira FreeBSD UNIX: Web: https://FreeBSD.org -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
[slurm-users] Re: FreeBSD/aarch64: ld: error: unknown emulation: elf_aarch64
(...) Fixed with: +.elif ${ARCH} == aarch64 + ${FIND} ${LLD2FIX:C|^|${WRKSRC}/src/|} -name Makefile.in | ${XARGS} \ + ${REINPLACE_CMD} -e 's|-r -o|-r -m aarch64elf -o|' Thanks and sorry for the noise as I really missed this detail :) Cheers, Nuno Teixeira escreveu (segunda, 6/05/2024 à(s) 19:59): > Hello, > > I too think this the cause and I really missed it: > > .if ${ARCH} == powerpc64le > ${FIND} ${LLD2FIX:C|^|${WRKSRC}/src/|} -name Makefile.in | > ${XARGS} \ > ${REINPLACE_CMD} -e 's|-r -o|-r -m elf64lppc -o|' > .elif ${ARCH} == powerpc64 > ${FIND} ${LLD2FIX:C|^|${WRKSRC}/src/|} -name Makefile.in | > ${XARGS} \ > ${REINPLACE_CMD} -e 's|-r -o|-r -m elf64ppc -o|' > .else > ${FIND} ${LLD2FIX:C|^|${WRKSRC}/src/|} -name Makefile.in | > ${XARGS} \ > ${REINPLACE_CMD} -e 's|-r -o|-r -m elf_${ARCH} -o|' > > I will adjust it and see build result. > > Thanks, > > Christopher Samuel via slurm-users > escreveu (segunda, 6/05/2024 à(s) 14:35): > >> On 5/6/24 6:38 am, Nuno Teixeira via slurm-users wrote: >> >> > Any clues about "elf_aarch64" and "aarch64elf" mismatch? >> >> As I mentioned I think this is coming from the FreeBSD patching that's >> being done to the upstream Slurm sources, specifically it looks like >> elf_aarch64 is being injected here: >> >> /usr/bin/sed -i.bak -e 's|"/proc|"/compat/linux/proc|g' -e >> 's|(/proc)|(/compat/linux/proc)|g' >> >> /wrkdirs/usr/ports/sysutils/slurm-wlm/work/slurm-23.11.6/src/slurmd/slurmstepd/req.c >> /usr/bin/find >> /wrkdirs/usr/ports/sysutils/slurm-wlm/work/slurm-23.11.6/src/api >> /wrkdirs/usr/ports/sysutils/slurm-wlm/work/slurm-23.11.6/src/plugins/openapi >> >> /wrkdirs/usr/ports/sysutils/slurm-wlm/work/slurm-23.11.6/src/sacctmgr >> /wrkdirs/usr/ports/sysutils/slurm-wlm/work/slurm-23.11.6/src/sackd >> /wrkdirs/usr/ports/sysutils/slurm-wlm/work/slurm-23.11.6/src/scontrol >> /wrkdirs/usr/ports/sysutils/slurm-wlm/work/slurm-23.11.6/src/scrontab >> /wrkdirs/usr/ports/sysutils/slurm-wlm/work/slurm-23.11.6/src/scrun >> /wrkdirs/usr/ports/sysutils/slurm-wlm/work/slurm-23.11.6/src/slurmctld >> /wrkdirs/usr/ports/sysutils/slurm-wlm/work/slurm-23.11.6/src/slurmd/slurmd >> >> /wrkdirs/usr/ports/sysutils/slurm-wlm/work/slurm-23.11.6/src/squeue >> -name Makefile.in | /usr/bin/xargs /usr/bin/sed -i.bak -e 's|-r >> -o|-r >> -m elf_aarch64 -o|' >> >> So I guess that will need to be fixed to match what FreeBSD supports. >> >> I don't think this is a Slurm issue from what I see there. >> >> All the best, >> Chris >> -- >> Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA >> >> >> -- >> slurm-users mailing list -- slurm-users@lists.schedmd.com >> To unsubscribe send an email to slurm-users-le...@lists.schedmd.com >> > > > -- > Nuno Teixeira > FreeBSD UNIX: Web: https://FreeBSD.org > -- Nuno Teixeira FreeBSD UNIX: Web: https://FreeBSD.org -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
[slurm-users] Re: FreeBSD/aarch64: ld: error: unknown emulation: elf_aarch64
On 5/6/24 3:19 pm, Nuno Teixeira via slurm-users wrote: Fixed with: [...] Thanks and sorry for the noise as I really missed this detail :) So glad it helped! Best of luck with this work. -- Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
[slurm-users] Re: Convergence of Kube and Slurm?
Note: I’m aware that I can run Kube on a single node, but we need more resources. So ultimately we need a way to have Slurm and Kube exist in the same cluster, both sharing the full amount of resources and both being fully aware of resource usage. This is something that we (SchedMD) are working on, although it's a bit earlier than I was planning to publicly announce anything... This is a very high-level view, and I have to apologize for stalling a bit, but: we've hired a team to build out a collection of tools that we're calling "Slinky" [1]. These provide for canonical ways of running Slurm within Kubernetes, ways of maintaining and managing the cluster state, and scheduling integration to allow for compute nodes to be available to both Kubernetes and Slurm environments while coordinating their status. We'll be talking about it in more details at the Slurm User Group Meeting in Oslo [3], then KubeCon North America in Salt Lake, and SC'24 in Atlanta. We'll have the (open-source, Apache 2.0 licensed) code for our first development phase available by SC'24 if not sooner. There's a placeholder documentation page [4] that points to some of the presentations I've given before talking about approaches to tackling this converged-computing model, but I'll caution they're a bit dated and the Slinky-specific presentation we've been working on internally aren't publicly available yet. If there are SchedMD support customers that have specific use cases, please feel free to ping your account managers if you'd like to chat at some point in the next few months. - Tim [1] Slinky is not an acronym (neither is Slurm [2]), but loosely stands for "Slurm in Kubernetes". [2] https://slurm.schedmd.com/faq.html#acronym [3] https://www.schedmd.com/about-schedmd/events/ [4] https://slurm.schedmd.com/slinky.html -- Tim Wickberg Chief Technology Officer, SchedMD LLC Commercial Slurm Development and Support -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
[slurm-users] Re: Convergence of Kube and Slurm?
Tim Wickberg via slurm-users writes: > [1] Slinky is not an acronym (neither is Slurm [2]), but loosely > stands for "Slurm in Kubernetes". And not at all inspired by Slinky Dog in Toy Story, I guess. :D -- Cheers, Bjørn-Helge Mevik, dr. scient, Department for Research Computing, University of Oslo signature.asc Description: PGP signature -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com