Keep the /etc/password, group synced to all the nodes should work. And it
will need to set up an SSH key for MPI.
Best,
Feng
On Mon, Feb 10, 2025 at 10:29 PM mark.w.moorcroft--- via slurm-users <
slurm-users@lists.schedmd.com> wrote:
> If you set up slurm elastic cloud in EC2 without LDAP, wha
You can also check https://github.com/prod-feng/slurm_tools
slurm_job_perf_show.py may be helpful.
I used to try to use slurm_job_perf_show_email.py to send emails to
users to summarize their usage, like monthly. While some users seemed
to get confused, so stopped.
Best,
Feng
On Fri, Aug 9, 20
yes, the algorithm should be like that 1 cpu (core) per job(task).
Like someone mentioned already, need to to --oversubscribe=10 on cpu
cores, meaning 10 jobs on each core for you case. Slurm.conf.
Best,
Feng
On Fri, Jun 21, 2024 at 6:52 AM Arnuld via slurm-users
wrote:
>
> > Every job will need
Hi All,
I am having trouble calculating the real RSS memory usage by some kind
of users' jobs. Which the sacct returned wrong numbers.
Rocky Linux release 8.5, Slurm 21.08
(slurm.conf)
ProctrackType=proctrack/cgroup
JobAcctGatherType=jobacct_gather/linux
The troubling jobs are like:
1. python
so.1 (0x14a9d82d)
> > libdl.so.2 => /lib64/libdl.so.2 (0x000014a9d82c9000)
> > libm.so.6 => /lib64/libm.so.6 (0x14a9d7f25000)
> > libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x14a9d82ae000)
> > libc.so.6 => /lib64/libc.so.6 (0x14a9d7
4/libdl.so.2 (0x14a9d82c9000)
> libm.so.6 => /lib64/libm.so.6 (0x14a9d7f25000)
> libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x14a9d82ae000)
> libc.so.6 => /lib64/libc.so.6 (0x14a9d7c0)
> /lib64/ld-linux-x86-64.so.2 (0x14a9d8306000)
>
&g
Looks more like a runtime environment issue.
Check the binaries:
ldd /mnt/local/ollama/ollama
on both clusters and comparing the output may give some hints.
Best,
Feng
On Tue, May 14, 2024 at 2:41 PM Dj Merrill via slurm-users
wrote:
>
> I'm running into a strange issue and I'm hoping anoth