[slurm-users] Re: srun weirdness

2024-05-14 Thread Feng Zhang via slurm-users
Do you have containers setting? On Tue, May 14, 2024 at 3:57 PM Feng Zhang wrote: > > Not sure, very strange, while the two linux-vdso.so.1 looks different: > > [deej@moose66 ~]$ ldd /mnt/local/ollama/ollama > linux-vdso.so.1 (0x7ffde81ee000) > > > [deej@moose66 ~]$ ldd /mnt/local/ollama

[slurm-users] Slurm release candidate version 24.05.0rc1 available for testing

2024-05-14 Thread Marshall Garey via slurm-users
We are pleased to announce the availability of Slurm release candidate 24.05.0rc1. To highlight some new features coming in 24.05: - (Optional) isolated Job Step management. Enabled on a job-by-job basis with the --stepmgr option, or globally through SlurmctldParameters=enable_stepmgr. - Fede

[slurm-users] Re: srun weirdness

2024-05-14 Thread Feng Zhang via slurm-users
Not sure, very strange, while the two linux-vdso.so.1 looks different: [deej@moose66 ~]$ ldd /mnt/local/ollama/ollama linux-vdso.so.1 (0x7ffde81ee000) [deej@moose66 ~]$ ldd /mnt/local/ollama/ollama linux-vdso.so.1 (0x7fffa66ff000) Best, Feng On Tue, May 14, 2024 at 3:43 PM D

[slurm-users] Re: srun weirdness

2024-05-14 Thread Dj Merrill via slurm-users
Hi Feng, Thank you for replying. It is the same binary on the same machine that fails. If I ssh to a compute node on the second cluster, it works fine. It fails when running in an interactive shell obtained with srun on that same compute node. I agree that it seems like a runtime environment

[slurm-users] Re: srun weirdness

2024-05-14 Thread Feng Zhang via slurm-users
Looks more like a runtime environment issue. Check the binaries: ldd /mnt/local/ollama/ollama on both clusters and comparing the output may give some hints. Best, Feng On Tue, May 14, 2024 at 2:41 PM Dj Merrill via slurm-users wrote: > > I'm running into a strange issue and I'm hoping anoth

[slurm-users] srun weirdness

2024-05-14 Thread Dj Merrill via slurm-users
I'm running into a strange issue and I'm hoping another set of brains looking at this might help.  I would appreciate any feedback. I have two Slurm Clusters.  The first cluster is running Slurm 21.08.8 on Rocky Linux 8.9 machines.  The second cluster is running Slurm 23.11.6 on Rocky Linux 9.

[slurm-users] Re: Submitting from an untrusted node

2024-05-14 Thread Brian Andrus via slurm-users
Rike, Assuming the data, scripts and other dependencies are already on the cluster, you could just ssh and execute the sbatch command in a single shot: ssh submitnode sbatch some_script.sh It will ask for a password if appropriate and could use ssh keys to bypass that need. Brian Andrus O

[slurm-users] Submitting from an untrusted node

2024-05-14 Thread Rike-Benjamin Schuppner via slurm-users
Hi, If I understand it correctly, the MUNGE and SACK authentication modules naturally require that no-one can get access to the key. This means that we should not use our normal workstations to which our users have physical access to run any jobs, nor could our users use the workstations to sub