Re: [slurm-users] Restoring Slurm

2018-04-09 Thread Ole Holm Nielsen
On 04/09/2018 10:54 PM, Roberts, John E. wrote: The documentation is a little unclear to me, so I was wondering how do a complete backup and restore of Slurm for testing and/or disaster recovery. I'm looking to upgrade Slurm from 16.05.10 to the latest and I'm not sure all of what should go. I

[slurm-users] Restoring Slurm

2018-04-09 Thread Roberts, John E.
Hi, The documentation is a little unclear to me, so I was wondering how do a complete backup and restore of Slurm for testing and/or disaster recovery. I'm looking to upgrade Slurm from 16.05.10 to the latest and I'm not sure all of what should go. I stood up some VMs to test this upgrade and m

[slurm-users] PID file /var/run/slurm/slurmd.pid not readable (yet?) after start:

2018-04-09 Thread Christian Goll
Hello list, we have encountered the issue that with SUSE on the startup of every slurm daemon we get an error message like the following: PID file /var/run/slurm/slurmd.pid not readable (yet?) after start: Can anyone please confirm that this is not a SUSE only problem? This message is generated b

[slurm-users] Slurm and memory

2018-04-09 Thread Dmitri Chebotarov
Hello I'm trying to figure out how to change SLURM's behavior on gathering free memory from nodes. At this time 'sinfo' reports 'free' memory from the node (and not 'available'): F.e: #sinfo -eN -o %N,%m,%e,%C ... NODE067,64170,14672,0/32/0/32 I can see the NODE067 has no jobs running - 0 CPUs

Re: [slurm-users] SLURM's reservations

2018-04-09 Thread De Giorgi Jean-Claude
Hello Sébastien, Thanks for your answer. I found what I needed. Regards, Jean-Claude From: slurm-users on behalf of Sébastien VIGNERON Reply-To: Slurm User Community List Date: Friday, 6 April 2018 at 17:12 To: Slurm User Community List Subject: Re: [slurm-users] SLURM's reservations Hi,

[slurm-users] Job Preemption Suspend/Resume

2018-04-09 Thread Nicolò Parmiggiani
Hi, i have a problem with Preemption. When high priority job suspend lower priority job all its ok. But when high priority job ends and slurm resume lower priority one, sometimes it happens that the resumed job never ends, the status is “R” the time is running but the job does nothing. I’ve seen