[slurm-users] Re: FreeBSD/aarch64: ld: error: unknown emulation: elf_aarch64

2024-05-06 Thread Nuno Teixeira via slurm-users
Hello,

I instructed port to use binutils from ports (version 2.40 native) instead
of base:

`/usr/local/bin/ld: unrecognised emulation mode: elf_aarch64`

```
/usr/local/bin/ld -V |grep aarch64
   aarch64cloudabi
   aarch64cloudabib
   aarch64elf
   aarch64elf32
   aarch64elf32b
   aarch64elfb
   aarch64fbsd
   aarch64fbsdb
   aarch64haiku
   aarch64linux
   aarch64linux32
   aarch64linux32b
   aarch64linuxb
   aarch64pe
```

Any clues about "elf_aarch64" and "aarch64elf" mismatch?

Thanks,

Christopher Samuel via slurm-users  escreveu
(sábado, 4/05/2024 à(s) 20:27):

> On 5/4/24 4:24 am, Nuno Teixeira via slurm-users wrote:
>
> > Any clues?
> >
> >  > ld: error: unknown emulation: elf_aarch64
>
> All I can think is that your ld doesn't like elf_aarch64, from the log
> your posting it looks that's being injected from the FreeBSD ports
> system. Looking at the man page for ld on Linux it says:
>
>-m emulation
> Emulate the emulation linker.  You can list the available
> emulations with the --verbose or -V options.
>
> So I'd guess you'd need to look at what that version of ld supports and
> then update the ports system to match.
>
> Good luck!
>
> All the best,
> Chris
> --
> Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA
>
>
> --
> slurm-users mailing list -- slurm-users@lists.schedmd.com
> To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
>


-- 
Nuno Teixeira
FreeBSD UNIX: Web:  https://FreeBSD.org

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: Convergence of Kube and Slurm?

2024-05-06 Thread Daniel Letai via slurm-users


  
  
There is a kubeflow offering that might be of interest:
https://www.dkube.io/post/mlops-on-hpc-slurm-with-kubeflow


I have not tried it myself, no idea how well it works.


Regards,
--Dani_L.


On 05/05/2024 0:05, Dan Healy via
  slurm-users wrote:


  
  
Bright Cluster Manager has some verbiage on their marketing
  site that they can manage a cluster running both Kubernetes
  and Slurm. Maybe I misunderstood it. But nevertheless, I am
  encountering groups more frequently that want to run a stack
  of containers that need private container networking. 


What’s the current state of using the same HPC
  cluster for both Slurm and Kube? 


Note: I’m aware that I can run Kube on a single
  node, but we need more resources. So ultimately we need a way
  to have Slurm and Kube exist in the same cluster, both sharing
  the full amount of resources and both being fully aware of
  resource usage. 


  Thanks,

Daniel Healy

  
  
  
  


  


-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Invalid/incorrect gres.conf syntax

2024-05-06 Thread Gestió Servidors via slurm-users
Hello,

I have configured my "gres.conf" in this way:
NodeName=node-gpu-1 AutoDetect=off Name=gpu Type=GeForceRTX2070 
File=/dev/nvidia0 Cores=0-11
NodeName=node-gpu-1 AutoDetect=off Name=gpu Type=GeForceGTX1080Ti 
File=/dev/nvidia1 Cores=12-23
NodeName=node-gpu-2 AutoDetect=off Name=gpu Type=GeForceGTX1080Ti 
File=/dev/nvidia0 Cores=0-11
NodeName=node-gpu-2 AutoDetect=off Name=gpu Type=GeForceGTX1080 
File=/dev/nvidia1 Cores=12-23
NodeName=node-gpu-3 AutoDetect=off Name=gpu Type=GeForceRTX3080 
File=/dev/nvidia0 Cores=0-11
NodeName=node-gpu-4 AutoDetect=off Name=gpu Type=GeForceRTX3080 
File=/dev/nvidia0 Cores=0-7

node-gpu-1 and node-gpu-2 are two systems with two sockets; node-gpu-3 and 
node-gpu-4 have only one socket.


In my "slurm.conf" I have these lines:
AccountingStorageTRES=gres/gpu
SelectType=select/cons_tres
GresTypes=gpu
NodeName=node-gpu-1 CPUs=24 SocketsPerBoard=2 CoresPerSocket=6 ThreadsPerCore=2 
RealMemory=96000 TmpDisk=47000 Gres=gpu:GeForceRTX2070:1,gpu:GeForceGTX1080Ti:1
NodeName=node-gpu-2 CPUs=24 SocketsPerBoard=2 CoresPerSocket=6 ThreadsPerCore=2 
RealMemory=96000 TmpDisk=47000 Gres=gpu:GeForceGTX1080Ti:1,gpu:GeForceGTX1080:1
NodeName=node-gpu-3 CPUs=12 SocketsPerBoard=1 CoresPerSocket=6 ThreadsPerCore=2 
RealMemory=23000 Gres=gpu:GeForceRTX3080:1
NodeName=node-gpu-4 CPUs=8 SocketsPerBoard=1 CoresPerSocket=4 ThreadsPerCore=2 
RealMemory=7800 Gres=gpu:GeForceRTX3080:1


Thanks a lot!

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: FreeBSD/aarch64: ld: error: unknown emulation: elf_aarch64

2024-05-06 Thread Steffen Grunewald via slurm-users
On Mon, 2024-05-06 at 11:38:30 +0100, Slurm users wrote:
> Hello,
> 
> I instructed port to use binutils from ports (version 2.40 native) instead
> of base:
> 
> `/usr/local/bin/ld: unrecognised emulation mode: elf_aarch64`
> 
> ```
> /usr/local/bin/ld -V |grep aarch64
>aarch64cloudabi
>aarch64cloudabib
>aarch64elf
>aarch64elf32
>aarch64elf32b
>aarch64elfb
>aarch64fbsd
>aarch64fbsdb
>aarch64haiku
>aarch64linux
>aarch64linux32
>aarch64linux32b
>aarch64linuxb
>aarch64pe
> ```
> 
> Any clues about "elf_aarch64" and "aarch64elf" mismatch?

This looks (I admit, I haven't UTSL) like the emulation mode is constructed
from an "elf_" prefix and the architecture nickname - this works for "x86_64"
and "i386" since the "ld" for the Intel/AMD architectures indeed provides the
emulations "elf_x86_64" and "elf_i386" while for 64-bit ARM "elf" is used as
a suffix. So this is mainly an ld inconsistency, I'm afraid (which might be
fixed by adding alias names - but the hopes are pretty low).

Non-emulated builds shouldn't be affected by the issue you found, right?
(There is Slurm built for ARM64 Debian. Maybe they have patched the source?)


Two ways to get this fixed I can imagine:
(a) find the place where the emulation mode name is combined, and teach that
  of possible exceptions to the implemented rule (there may be more than just
  ARM - what about RISC-V, PPC64*, ...?)
(b) interrupt the build in a reasonable place, find all occurreences of the
  wrong emulation string, and replace it with its existing counterpart

There should be no doubt which one I'd prefer - I'll go and read TS ;)

Cheers,
 Steffen


-- 
Steffen Grunewald, Cluster Administrator
Max Planck Institute for Gravitational Physics (Albert Einstein Institute)
Am Mühlenberg 1 * D-14476 Potsdam-Golm * Germany
~~~
Fon: +49-331-567 7274
Mail: steffen.grunewald(at)aei.mpg.de
~~~

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Rootless Docker Errors with Slurm

2024-05-06 Thread ARNULD via slurm-users
I am trying to integrate Rootless Docker with Slurm.  have set-up Rootless
Docker as per the docs "https://slurm.schedmd.com/containers.html"; . I have
scrum.lua, oci.conf (for crun)  and slurm.conf in place. Then
"~/.config/docker/daemon.json" and
"~/.config/systemd/user/docker.service.d/override.conf" are in place too.
But I can't seem to get it to work:

$ docker run $DOCKER_SECURITY alpine /bin/printenv SLURM_JOB_ID
Unable to find image 'alpine:latest' locally
latest: Pulling from library/alpine
4abcf2066143: Pull complete
Digest:
sha256:c5b1261d6d3e43071626931fc004f70149baeba2c8ec672bd4f27761f8e1ad6b
Status: Downloaded newer image for alpine:latest
docker: Error response from daemon: failed to create task for container:
failed to create shim task: OCI runtime create failed: unable to retrieve
OCI runtime error (open
/run/user/1000/docker-exec/containerd/daemon/io.containerd.runtime.v2.task/moby/97e8dd767977ac03ab7af54c015c0fd5dfd26e737771b977acb7e41f799023aa/log.json:
no such file or directory): /usr/bin/scrun did not terminate successfully:
exit status 1: unknown.

One thing is if I don't use Slurm as the runtime for Docker (if I remove
the " ~/.config/docker/daemon.json") then docker runs fine

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: FreeBSD/aarch64: ld: error: unknown emulation: elf_aarch64

2024-05-06 Thread Christopher Samuel via slurm-users

On 5/6/24 6:38 am, Nuno Teixeira via slurm-users wrote:


Any clues about "elf_aarch64" and "aarch64elf" mismatch?


As I mentioned I think this is coming from the FreeBSD patching that's 
being done to the upstream Slurm sources, specifically it looks like 
elf_aarch64 is being injected here:


/usr/bin/sed -i.bak -e 's|"/proc|"/compat/linux/proc|g'  -e 
's|(/proc)|(/compat/linux/proc)|g' 
/wrkdirs/usr/ports/sysutils/slurm-wlm/work/slurm-23.11.6/src/slurmd/slurmstepd/req.c
/usr/bin/find 
/wrkdirs/usr/ports/sysutils/slurm-wlm/work/slurm-23.11.6/src/api 
/wrkdirs/usr/ports/sysutils/slurm-wlm/work/slurm-23.11.6/src/plugins/openapi 
/wrkdirs/usr/ports/sysutils/slurm-wlm/work/slurm-23.11.6/src/sacctmgr 
/wrkdirs/usr/ports/sysutils/slurm-wlm/work/slurm-23.11.6/src/sackd 
/wrkdirs/usr/ports/sysutils/slurm-wlm/work/slurm-23.11.6/src/scontrol 
/wrkdirs/usr/ports/sysutils/slurm-wlm/work/slurm-23.11.6/src/scrontab 
/wrkdirs/usr/ports/sysutils/slurm-wlm/work/slurm-23.11.6/src/scrun 
/wrkdirs/usr/ports/sysutils/slurm-wlm/work/slurm-23.11.6/src/slurmctld 
/wrkdirs/usr/ports/sysutils/slurm-wlm/work/slurm-23.11.6/src/slurmd/slurmd 
/wrkdirs/usr/ports/sysutils/slurm-wlm/work/slurm-23.11.6/src/squeue 
-name Makefile.in | /usr/bin/xargs	 /usr/bin/sed -i.bak -e 's|-r -o|-r 
-m elf_aarch64 -o|'


So I guess that will need to be fixed to match what FreeBSD supports.

I don't think this is a Slurm issue from what I see there.

All the best,
Chris
--
Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA


--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: FreeBSD/aarch64: ld: error: unknown emulation: elf_aarch64

2024-05-06 Thread Nuno Teixeira via slurm-users
Hello,

I too think this the cause and I really missed it:

.if ${ARCH} == powerpc64le
${FIND} ${LLD2FIX:C|^|${WRKSRC}/src/|} -name Makefile.in | ${XARGS}
\
${REINPLACE_CMD} -e 's|-r -o|-r -m elf64lppc -o|'
.elif ${ARCH} == powerpc64
${FIND} ${LLD2FIX:C|^|${WRKSRC}/src/|} -name Makefile.in | ${XARGS}
\
${REINPLACE_CMD} -e 's|-r -o|-r -m elf64ppc -o|'
.else
${FIND} ${LLD2FIX:C|^|${WRKSRC}/src/|} -name Makefile.in | ${XARGS}
\
${REINPLACE_CMD} -e 's|-r -o|-r -m elf_${ARCH} -o|'

I will adjust it and see build result.

Thanks,

Christopher Samuel via slurm-users  escreveu
(segunda, 6/05/2024 à(s) 14:35):

> On 5/6/24 6:38 am, Nuno Teixeira via slurm-users wrote:
>
> > Any clues about "elf_aarch64" and "aarch64elf" mismatch?
>
> As I mentioned I think this is coming from the FreeBSD patching that's
> being done to the upstream Slurm sources, specifically it looks like
> elf_aarch64 is being injected here:
>
> /usr/bin/sed -i.bak -e 's|"/proc|"/compat/linux/proc|g'  -e
> 's|(/proc)|(/compat/linux/proc)|g'
>
> /wrkdirs/usr/ports/sysutils/slurm-wlm/work/slurm-23.11.6/src/slurmd/slurmstepd/req.c
> /usr/bin/find
> /wrkdirs/usr/ports/sysutils/slurm-wlm/work/slurm-23.11.6/src/api
> /wrkdirs/usr/ports/sysutils/slurm-wlm/work/slurm-23.11.6/src/plugins/openapi
>
> /wrkdirs/usr/ports/sysutils/slurm-wlm/work/slurm-23.11.6/src/sacctmgr
> /wrkdirs/usr/ports/sysutils/slurm-wlm/work/slurm-23.11.6/src/sackd
> /wrkdirs/usr/ports/sysutils/slurm-wlm/work/slurm-23.11.6/src/scontrol
> /wrkdirs/usr/ports/sysutils/slurm-wlm/work/slurm-23.11.6/src/scrontab
> /wrkdirs/usr/ports/sysutils/slurm-wlm/work/slurm-23.11.6/src/scrun
> /wrkdirs/usr/ports/sysutils/slurm-wlm/work/slurm-23.11.6/src/slurmctld
> /wrkdirs/usr/ports/sysutils/slurm-wlm/work/slurm-23.11.6/src/slurmd/slurmd
> /wrkdirs/usr/ports/sysutils/slurm-wlm/work/slurm-23.11.6/src/squeue
> -name Makefile.in | /usr/bin/xargs   /usr/bin/sed -i.bak -e 's|-r
> -o|-r
> -m elf_aarch64 -o|'
>
> So I guess that will need to be fixed to match what FreeBSD supports.
>
> I don't think this is a Slurm issue from what I see there.
>
> All the best,
> Chris
> --
> Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA
>
>
> --
> slurm-users mailing list -- slurm-users@lists.schedmd.com
> To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
>


-- 
Nuno Teixeira
FreeBSD UNIX: Web:  https://FreeBSD.org

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: FreeBSD/aarch64: ld: error: unknown emulation: elf_aarch64

2024-05-06 Thread Nuno Teixeira via slurm-users
(...)

Fixed with:

+.elif ${ARCH} == aarch64
+   ${FIND} ${LLD2FIX:C|^|${WRKSRC}/src/|} -name Makefile.in |
${XARGS} \
+   ${REINPLACE_CMD} -e 's|-r -o|-r -m aarch64elf -o|'

Thanks and sorry for the noise as I really missed this detail :)

Cheers,

Nuno Teixeira  escreveu (segunda, 6/05/2024 à(s)
19:59):

> Hello,
>
> I too think this the cause and I really missed it:
>
> .if ${ARCH} == powerpc64le
> ${FIND} ${LLD2FIX:C|^|${WRKSRC}/src/|} -name Makefile.in |
> ${XARGS} \
> ${REINPLACE_CMD} -e 's|-r -o|-r -m elf64lppc -o|'
> .elif ${ARCH} == powerpc64
> ${FIND} ${LLD2FIX:C|^|${WRKSRC}/src/|} -name Makefile.in |
> ${XARGS} \
> ${REINPLACE_CMD} -e 's|-r -o|-r -m elf64ppc -o|'
> .else
> ${FIND} ${LLD2FIX:C|^|${WRKSRC}/src/|} -name Makefile.in |
> ${XARGS} \
> ${REINPLACE_CMD} -e 's|-r -o|-r -m elf_${ARCH} -o|'
>
> I will adjust it and see build result.
>
> Thanks,
>
> Christopher Samuel via slurm-users 
> escreveu (segunda, 6/05/2024 à(s) 14:35):
>
>> On 5/6/24 6:38 am, Nuno Teixeira via slurm-users wrote:
>>
>> > Any clues about "elf_aarch64" and "aarch64elf" mismatch?
>>
>> As I mentioned I think this is coming from the FreeBSD patching that's
>> being done to the upstream Slurm sources, specifically it looks like
>> elf_aarch64 is being injected here:
>>
>> /usr/bin/sed -i.bak -e 's|"/proc|"/compat/linux/proc|g'  -e
>> 's|(/proc)|(/compat/linux/proc)|g'
>>
>> /wrkdirs/usr/ports/sysutils/slurm-wlm/work/slurm-23.11.6/src/slurmd/slurmstepd/req.c
>> /usr/bin/find
>> /wrkdirs/usr/ports/sysutils/slurm-wlm/work/slurm-23.11.6/src/api
>> /wrkdirs/usr/ports/sysutils/slurm-wlm/work/slurm-23.11.6/src/plugins/openapi
>>
>> /wrkdirs/usr/ports/sysutils/slurm-wlm/work/slurm-23.11.6/src/sacctmgr
>> /wrkdirs/usr/ports/sysutils/slurm-wlm/work/slurm-23.11.6/src/sackd
>> /wrkdirs/usr/ports/sysutils/slurm-wlm/work/slurm-23.11.6/src/scontrol
>> /wrkdirs/usr/ports/sysutils/slurm-wlm/work/slurm-23.11.6/src/scrontab
>> /wrkdirs/usr/ports/sysutils/slurm-wlm/work/slurm-23.11.6/src/scrun
>> /wrkdirs/usr/ports/sysutils/slurm-wlm/work/slurm-23.11.6/src/slurmctld
>> /wrkdirs/usr/ports/sysutils/slurm-wlm/work/slurm-23.11.6/src/slurmd/slurmd
>>
>> /wrkdirs/usr/ports/sysutils/slurm-wlm/work/slurm-23.11.6/src/squeue
>> -name Makefile.in | /usr/bin/xargs   /usr/bin/sed -i.bak -e 's|-r
>> -o|-r
>> -m elf_aarch64 -o|'
>>
>> So I guess that will need to be fixed to match what FreeBSD supports.
>>
>> I don't think this is a Slurm issue from what I see there.
>>
>> All the best,
>> Chris
>> --
>> Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA
>>
>>
>> --
>> slurm-users mailing list -- slurm-users@lists.schedmd.com
>> To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
>>
>
>
> --
> Nuno Teixeira
> FreeBSD UNIX: Web:  https://FreeBSD.org
>


-- 
Nuno Teixeira
FreeBSD UNIX: Web:  https://FreeBSD.org

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: FreeBSD/aarch64: ld: error: unknown emulation: elf_aarch64

2024-05-06 Thread Christopher Samuel via slurm-users

On 5/6/24 3:19 pm, Nuno Teixeira via slurm-users wrote:


Fixed with:


[...]


Thanks and sorry for the noise as I really missed this detail :)


So glad it helped! Best of luck with this work.

--
Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA


--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: Convergence of Kube and Slurm?

2024-05-06 Thread Tim Wickberg via slurm-users
Note: I’m aware that I can run Kube on a single node, but we need more 
resources. So ultimately we need a way to have Slurm and Kube exist in 
the same cluster, both sharing the full amount of resources and both 
being fully aware of resource usage.


This is something that we (SchedMD) are working on, although it's a bit 
earlier than I was planning to publicly announce anything...


This is a very high-level view, and I have to apologize for stalling a 
bit, but: we've hired a team to build out a collection of tools that 
we're calling "Slinky" [1]. These provide for canonical ways of running 
Slurm within Kubernetes, ways of maintaining and managing the cluster 
state, and scheduling integration to allow for compute nodes to be 
available to both Kubernetes and Slurm environments while coordinating 
their status.


We'll be talking about it in more details at the Slurm User Group 
Meeting in Oslo [3], then KubeCon North America in Salt Lake, and SC'24 
in Atlanta. We'll have the (open-source, Apache 2.0 licensed) code for 
our first development phase available by SC'24 if not sooner.


There's a placeholder documentation page [4] that points to some of the 
presentations I've given before talking about approaches to tackling 
this converged-computing model, but I'll caution they're a bit dated and 
the Slinky-specific presentation we've been working on internally aren't 
publicly available yet.


If there are SchedMD support customers that have specific use cases, 
please feel free to ping your account managers if you'd like to chat at 
some point in the next few months.


- Tim

[1] Slinky is not an acronym (neither is Slurm [2]), but loosely stands 
for "Slurm in Kubernetes".


[2] https://slurm.schedmd.com/faq.html#acronym

[3] https://www.schedmd.com/about-schedmd/events/

[4] https://slurm.schedmd.com/slinky.html

--
Tim Wickberg
Chief Technology Officer, SchedMD LLC
Commercial Slurm Development and Support

--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: Convergence of Kube and Slurm?

2024-05-06 Thread Bjørn-Helge Mevik via slurm-users
Tim Wickberg via slurm-users  writes:

> [1] Slinky is not an acronym (neither is Slurm [2]), but loosely
> stands for "Slurm in Kubernetes".

And not at all inspired by Slinky Dog in Toy Story, I guess. :D

-- 
Cheers,
Bjørn-Helge Mevik, dr. scient,
Department for Research Computing, University of Oslo



signature.asc
Description: PGP signature

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com