[slurm-users] Re: /etc/passwd sync?

2025-02-11 Thread Feng Zhang via slurm-users
Keep the /etc/password, group synced to all the nodes should work. And it will need to set up an SSH key for MPI. Best, Feng On Mon, Feb 10, 2025 at 10:29 PM mark.w.moorcroft--- via slurm-users < slurm-users@lists.schedmd.com> wrote: > If you set up slurm elastic cloud in EC2 without LDAP, wha

[slurm-users] Re: Print Slurm Stats on Login

2024-08-28 Thread Feng Zhang via slurm-users
You can also check https://github.com/prod-feng/slurm_tools slurm_job_perf_show.py may be helpful. I used to try to use slurm_job_perf_show_email.py to send emails to users to summarize their usage, like monthly. While some users seemed to get confused, so stopped. Best, Feng On Fri, Aug 9, 20

[slurm-users] Re: Can Not Use A Single GPU for Multiple Jobs

2024-06-21 Thread Feng Zhang via slurm-users
yes, the algorithm should be like that 1 cpu (core) per job(task). Like someone mentioned already, need to to --oversubscribe=10 on cpu cores, meaning 10 jobs on each core for you case. Slurm.conf. Best, Feng On Fri, Jun 21, 2024 at 6:52 AM Arnuld via slurm-users wrote: > > > Every job will need

[slurm-users] maxrss reported by sachet is wrong

2024-06-07 Thread Feng Zhang via slurm-users
Hi All, I am having trouble calculating the real RSS memory usage by some kind of users' jobs. Which the sacct returned wrong numbers. Rocky Linux release 8.5, Slurm 21.08 (slurm.conf) ProctrackType=proctrack/cgroup JobAcctGatherType=jobacct_gather/linux The troubling jobs are like: 1. python

[slurm-users] Re: srun weirdness

2024-05-14 Thread Feng Zhang via slurm-users
so.1 (0x14a9d82d) > > libdl.so.2 => /lib64/libdl.so.2 (0x000014a9d82c9000) > > libm.so.6 => /lib64/libm.so.6 (0x14a9d7f25000) > > libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x14a9d82ae000) > > libc.so.6 => /lib64/libc.so.6 (0x14a9d7

[slurm-users] Re: srun weirdness

2024-05-14 Thread Feng Zhang via slurm-users
4/libdl.so.2 (0x14a9d82c9000) > libm.so.6 => /lib64/libm.so.6 (0x14a9d7f25000) > libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x14a9d82ae000) > libc.so.6 => /lib64/libc.so.6 (0x14a9d7c0) > /lib64/ld-linux-x86-64.so.2 (0x14a9d8306000) > &g

[slurm-users] Re: srun weirdness

2024-05-14 Thread Feng Zhang via slurm-users
Looks more like a runtime environment issue. Check the binaries: ldd /mnt/local/ollama/ollama on both clusters and comparing the output may give some hints. Best, Feng On Tue, May 14, 2024 at 2:41 PM Dj Merrill via slurm-users wrote: > > I'm running into a strange issue and I'm hoping anoth