Re: [slurm-users] pam_slurm_adopt does not constrain memory?

2018-10-24 Thread Christopher Samuel
On 25/10/18 2:29 pm, Christopher Samuel wrote: Could explain why this isn't something we see consistently, and why we're both seeing it currently. This seems to be a handy way to find any processes that are not properly constrained by Slurm cgroups on compute nodes (at least in our configura

Re: [slurm-users] pam_slurm_adopt does not constrain memory?

2018-10-24 Thread Christopher Samuel
On 24/10/18 9:37 pm, Chris Samuel wrote: We're on 17.11.7 (for the moment, starting to plan upgrade to 18.08.x). From the NEWS file in 17.11.x (in this case for 17.11.10): -- Fix pam_slurm_adopt to honor action_adopt_failure. Could explain why this isn't something we see consistently, and w

Re: [slurm-users] Can't find an address

2018-10-24 Thread Lachlan Musicman
On Wed, 24 Oct 2018 at 22:56, Zohar Roe MLM wrote: > Hello, > > I have a node that from some reason change state to "Down" evert few > minutes. > > When I change it with scontrol to "resume" its ok until Down again. > > In the slurm server log I can see error: > > "agent/is_node_resp: node:myName

Re: [slurm-users] Looking for old SLURM versions

2018-10-24 Thread Bob Healey
Thanks.  I feel stupid for not finding that on my own. On 10/24/2018 06:02 PM, Andy Riebs wrote: Bob, you can find older versions of Slurm at the archive, at . Andy *From:* Bob Healey

Re: [slurm-users] Looking for old SLURM versions

2018-10-24 Thread Andy Riebs
Bob, you can find older versions of Slurm at the archive, at . Andy *From:* Bob Healey *Sent:* Wednesday, October 24, 2018 5:51PM *To:* Slurm-users *Cc:* *Subject:* [slurm-users] Looki

[slurm-users] Looking for old SLURM versions

2018-10-24 Thread Bob Healey
I'm in the process of upgrading a system that has been running 2.5.4 for the last 5 years with no issues.  I'd like to bring that up to something current, but I need a a bunch of older versions that do not appear to be online any longer to successfully migrate the database from ancient to curre

[slurm-users] Slurm versions 18.08.3 and 17.11.12 are now available

2018-10-24 Thread Tim Wickberg
We are pleased to announce the availability of Slurm versions 18.08.3 and 17.11.12. These versions include a fix for a regression introduced in 18.08.2 and 17.11.11 that could lead to a loss of accounting records if the slurmdbd was offline. All sites with 18.08.2 or 17.11.11 slurmctld process

[slurm-users] Can't find an address

2018-10-24 Thread Zohar Roe MLM
Hello, I have a node that from some reason change state to "Down" evert few minutes. When I change it with scontrol to "resume" its ok until Down again. In the slurm server log I can see error: "agent/is_node_resp: node:myName1 RPC:REQUEST_PING : Can't find an address, check slurm.conf" Now, The

Re: [slurm-users] Slurm not building with HWLOC 2.0.2

2018-10-24 Thread Andreas Henkel
Thank you Chris. I've come to think that I messed our dev-branch of Slurm. For some reason the plugins/task/ part was still on code base of 16.05.8 although the rest of the code was 17.11.7. About to fixing it right now. On 10/24/18 12:39 PM, Chris Samuel wrote: > On Wednesday, 24 October 2018 7:

Re: [slurm-users] Slurm not building with HWLOC 2.0.2

2018-10-24 Thread Chris Samuel
On Wednesday, 24 October 2018 7:16:58 PM AEDT Andreas Henkel wrote: > PS: sorry, I missed to tell the SLurm-Version: it's 17.11.7 It's always worth checking the NEWS file in git for changes after the release you're on in case it's since been fixed. https://github.com/SchedMD/slurm/blob/slurm-17

Re: [slurm-users] pam_slurm_adopt does not constrain memory?

2018-10-24 Thread Chris Samuel
On Wednesday, 24 October 2018 8:20:26 PM AEDT Chris Samuel wrote: > However, we've seen it now too. For extra LULZ it's not consistent, I've even got two users shells, one not constrained and the other constrained on the same compute node, both started today (in the last 6 hours). :-/ Nothing

Re: [slurm-users] pam_slurm_adopt does not constrain memory?

2018-10-24 Thread Chris Samuel
On Friday, 24 August 2018 7:00:05 PM AEDT Christian Peter wrote: > we're using the same distro. Yeah, I think (because of the way we run things) whilst the RPM upgrade for systemd was installed into the OS image and sync'd out to the nodes systemd hadn't been restarted so we'd not noticed it by

Re: [slurm-users] Slurm not building with HWLOC 2.0.2

2018-10-24 Thread Andreas Henkel
PS: sorry, I missed to tell the SLurm-Version: it's 17.11.7 On 10/24/18 9:43 AM, Andreas Henkel wrote: > > HI all, > > did anyone build Slurm using a recent version of HWLOC like 2.0.1 or > 2.0.2? > > When I try to I end up with > > task_cgroup_cpuset.c:486:40: error: 'struct hwloc_obj' has no mem

[slurm-users] Slurm not building with HWLOC 2.0.2

2018-10-24 Thread Andreas Henkel
HI all, did anyone build Slurm using a recent version of HWLOC like 2.0.1 or 2.0.2? When I try to I end up with task_cgroup_cpuset.c:486:40: error: 'struct hwloc_obj' has no member named 'allowed_cpuset'     hwloc_bitmap_or(cpuset, cpuset, pobj->allowed_cpuset);