Re: [slurm-users] Detecting non-MPI jobs running on multiple nodes

2022-09-29 Thread Loris Bennett
"Bernstein, Noam CIV USN NRL (6393) Washington DC (USA)" writes: > On Sep 29, 2022, at 10:34 AM, Steffen Grunewald > wrote: > > Hi Noam, > > I'm wondering why one would want to know that - given that there are > approaches to multi-node operation beyond MPI (Charm++ comes to mind)? > > The

Re: [slurm-users] Detecting non-MPI jobs running on multiple nodes

2022-09-29 Thread Bernstein, Noam CIV USN NRL (6393) Washington DC (USA)
On Sep 29, 2022, at 10:34 AM, Steffen Grunewald mailto:steffen.grunew...@aei.mpg.de>> wrote: Hi Noam, I'm wondering why one would want to know that - given that there are approaches to multi-node operation beyond MPI (Charm++ comes to mind)? The thread title requested a way of detecting non-MPI

Re: [slurm-users] Detecting non-MPI jobs running on multiple nodes

2022-09-29 Thread Steffen Grunewald
On Thu, 2022-09-29 at 14:03:58 +, Bernstein, Noam CIV USN NRL (6393) Washington DC (USA) wrote: > Can you check slurm for a job that requests multiple nodes but doesn't have > mpirun (or srun, or mpiexec) running on its head node? Hi Noam, I'm wondering why one would want to know that - giv

Re: [slurm-users] Detecting non-MPI jobs running on multiple nodes

2022-09-29 Thread Ward Poelmans
Hi Loris, On 29/09/2022 09:26, Loris Bennett wrote: I can see that this is potentially not easy, since an MPI job might have still have phases where only one core is actually being used. Slurm will create the needed cgroups on all the nodes that are part of the job when the job starts. So yo

Re: [slurm-users] Detecting non-MPI jobs running on multiple nodes

2022-09-29 Thread Bernstein, Noam CIV USN NRL (6393) Washington DC (USA)
Can you check slurm for a job that requests multiple nodes but doesn't have mpirun (or srun, or mpiexec) running on its head node?

Re: [slurm-users] Detecting non-MPI jobs running on multiple nodes

2022-09-29 Thread Loris Bennett
Hi Ole, Ole Holm Nielsen writes: > Hi Loris, > > On 9/29/22 09:26, Loris Bennett wrote: >> Has anyone already come up with a good way to identify non-MPI jobs which >> request multiple cores but don't restrict themselves to a single node, >> leaving cores idle on all but the first node? >> I can

Re: [slurm-users] Detecting non-MPI jobs running on multiple nodes

2022-09-29 Thread Loris Bennett
Hi Davide, That is a interesting idea. We already do some averaging, but over the whole of the past month. For each user we use the output of seff to generate two scatterplots: CPU-efficiency vs. CPU-hours and memory-efficiency vs. GB-hours. See https://www.fu-berlin.de/en/sites/high-perfo

Re: [slurm-users] Detecting non-MPI jobs running on multiple nodes

2022-09-29 Thread Ole Holm Nielsen
Hi Loris, On 9/29/22 09:26, Loris Bennett wrote: Has anyone already come up with a good way to identify non-MPI jobs which request multiple cores but don't restrict themselves to a single node, leaving cores idle on all but the first node? I can see that this is potentially not easy, since an M

Re: [slurm-users] Detecting non-MPI jobs running on multiple nodes

2022-09-29 Thread Davide DelVento
At my previous job there were cron jobs running everywhere measuring possibly idle cores which were eventually averaged out for the duration of the job, and reported (the day after) via email to the user support team. I believe they stopped doing so when compute became (relatively) cheap at the exp

[slurm-users] Detecting non-MPI jobs running on multiple nodes

2022-09-29 Thread Loris Bennett
Hi, Has anyone already come up with a good way to identify non-MPI jobs which request multiple cores but don't restrict themselves to a single node, leaving cores idle on all but the first node? I can see that this is potentially not easy, since an MPI job might have still have phases where only