Hi,
On 05/10/2021 08:45, Kevin Buckley wrote:
Trying to get my head around the extremely useful addition, for 21.08
onwards, as regards storing the batch scripts in the accounting database,
You are aware of two existing solutions to this that do not involve the
slurm accounting DB?
https:/
Hi,
On Tue, Mar 10, 2020 at 05:49:07AM +, Rundall, Jacob D wrote:
> I need to update the configuration for the nodes in a cluster and I’d like to
> let jobs keep running while I do so. Specifically I need to add
> RealMemory= to the node definitions (NodeName=). Is it safe to do this
> for
Hi Brian,
On Mon, Oct 28, 2019 at 10:42:59AM -0700, Brian Andrus wrote:
> Ok, I had been planning on getting around to it, so this prompted me to do
> so.
>
> Yes, I can get slurm 19.05.3 to build (and package) under CentOS 8.
>
> There are some caveats, however since many repositories and package
Hi Tony,
On Mon, Oct 21, 2019 at 01:52:21AM +, Tony Racho wrote:
> Hi:
>
> We are planning to upgrade our slurm cluster however we plan on NOT doing it
> in a one-go.
>
> We are on 18.08.7 at the moment (db, controller, clients)
>
> We'd like to do it in a phased approach.
>
> Stop communicat
Hi Jose,
On Thu, Sep 12, 2019 at 04:23:11PM +0200, Jose A wrote:
> Dear all,
>
> In the expansion of our Cluster we are considering to install SLURM within a
> virtual machine in order to simplify updates and reconfigurations.
>
> Does any of your have experience running SLURM in VMs? I would rea
Hi Mark, Chris,
On Mon, Jul 15, 2019 at 01:23:20PM -0400, Mark Hahn wrote:
> > Could it be a RHEL7 specific issue?
>
> no - centos7 systems here, and pam_adopt works.
Can you show what your /etc/pam.d/sshd looks like?
Kind regards,
-- Andy
signature.asc
Description: PGP signature
Hi Juergen,
On Fri, Jul 12, 2019 at 03:21:31PM +0200, Juergen Salk wrote:
> Dear all,
>
> I have configured pam_slurm_adopt in our Slurm test environment by
> following the corresponding documentation:
>
> https://slurm.schedmd.com/pam_slurm_adopt.html
>
> I've set `PrologFlags=contain´ in slurm.
Hi,
> So here's something funny. One user submitted a job that requested 60 cpu's
> and 40M of memory. Our largest nodes in that partition have 72 cpu's and
> 256G of memory. So when a user requests 400G of ram, what would be good
> behavior? I would like to see slurm reject the job, "job i
Hi,
On Wed, Jul 03, 2019 at 03:49:44PM +, David Baker wrote:
> Hello,
>
>
> A few of our users have asked about running longer jobs on our cluster.
> Currently our main/default compute partition has a time limit of 2.5 days.
> Potentially, a handful of users need jobs to run up to 5 hours. R
Hi,
> On 22 Aug 2018, at 16:27, Christian Peter
> wrote:
>
> hi,
>
> we observed a strange behavior of pam_slurm_adopt regarding the involved
> cgroups:
>
> when we start a shell as a new Slurm job using "srun", the process has
> freezer, cpuset and memory cgroups setup as e.g.
> "/slurm/u
Hi Chris,
> On 24 Aug 2018, at 10:57, Christian Peter
> wrote:
>
> hi,
>
> thank you patrick, thank you kilian for identifying a systemd issue here!
>
> for a quick test, we disabled and masked systemd-logind. the "memory" cgroup
> now works as expected. great!
>
> we're now watching out fo
Hello,
We are migrating away from a Torque/Moab setup. For user convenience, we’re
trying to make the differences minimal.
I am wondering is there is a way to set the job walltime in the job environment
(to set $PBS_WALLTIME). It’s unclear to me how this information can be
retrieved on the wo
Hi,
> On 3 Oct 2018, at 16:51, Andy Georges wrote:
>
> Hi all,
>
>> On 15 Sep 2018, at 14:47, Chris Samuel wrote:
>>
>> On Thursday, 13 September 2018 3:10:19 AM AEST Paul Edmon wrote:
>>
>>> Another way would be to make all your Linux users and
Hi all,
> On 15 Sep 2018, at 14:47, Chris Samuel wrote:
>
> On Thursday, 13 September 2018 3:10:19 AM AEST Paul Edmon wrote:
>
>> Another way would be to make all your Linux users and then map that in to
>> Slurm using sacctmgr.
>
> At ${JOB} and ${JOB-1} we've wired user creation in Slurm int
Hi,
As per the guidelines on the slurmdbd.conf and sachet manual pages, I have set
PrivateData=jobs (amongst others) in slurmdbd.conf.
However, at this point no job information is available anymore when running
sacct, it just does not provide any job related output:
vsc40075@gligar03 (SLURM_
> On 30 Apr 2018, at 22:37, Nate Coraor wrote:
>
> Hi Shawn,
>
> I'm wondering if you're still seeing this. I've recently enabled task/cgroup
> on 17.11.5 running on CentOS 7 and just discovered that jobs are escaping
> their cgroups. For me this is resulting in a lot of jobs ending in
> OU
Hi,
For some reason I am seeing memory cgroups disappear on the nodes:
[root@node3108 memory]# file $PWD/slurm
/sys/fs/cgroup/memory/slurm: cannot open (No such file or directory)
There is however a job running and other cgroups are still present:
[root@node3108 memory]# ls /sys/fs/cgroup/cp
Hello,
We are transitioning from Moab/Torque to Slurm.
I was wondering if there is a way to have Slurm also create the stdout (and
stderr) file for the job on the node (be default), rather than on the shared FS.
We sometimes have users who write a lot of stuff to stdout from their job
script
Hi,
> On 9 Mar 2018, at 21:58, Nicholas McCollum wrote:
>
> Connection refused makes me think a firewall issue.
>
> Assuming this is a test environment, could you try on the compute node:
>
> # iptables-save > iptables.bak
> # iptables -F && iptables -X
>
> Then test to see if it works. To
Hi all,
Cranked up the debug level a bit
Job was not started when using:
vsc40075@test2802 (banette) ~> /bin/salloc -N1 -n1 /bin/srun --pty bash -i
salloc: Granted job allocation 42
salloc: Waiting for resource configuration
salloc: Nodes node2801 are ready for job
For comparison purposes, runn
cessary.
>
> The most simple command that I typically use is:
>
> srun -N1 -n1 --pty bash -i
>
> Mike
>
>> On 3/9/18 10:20 AM, Andy Georges wrote:
>> Hi,
>>
>>
>> I am trying to get interactive jobs to work from the machine we use as a
>>
Hi,
I am trying to get interactive jobs to work from the machine we use as a login
node, i.e., where the users of the cluster log into and from where they
typically submit jobs.
I submit the job as follows:
vsc40075@test2802 (banette) ~> /bin/salloc -N1 -n1 /bin/srun bash -i
salloc: Granted
22 matches
Mail list logo