Hi
Does anyone know what is the recommended amount of memory to give slurms
mariadb database server?
I seem to remember reading a simple estimate based on the size of certain
tables (or something along those lines) but I can't find it now.
Thanks
I want to allow users to lower the priority of their jobs to allow other
peoples jobs to go first and am thinking the easiest way would be for them
to use the sbatch nice option. However all of ours jobs currently run with
a priorty of 1 as all of the priority weights are set to zero meaning
setti
Thanks for everyones help. All I needed to do was compile a new version of
pam_slurm.so. I'm aware there's a newer slurm_pam_adopt but everything was
already setup for pam_slurm.so so I just went with that.
Regards
Lloyd
On Wed, Jul 27, 2022 at 9:45 PM Bernd Melchers
wrote:
> >This happen
. And it was, until I did the upgrade.
On Fri, Jul 29, 2022 at 7:00 AM Loris Bennett
wrote:
> Hi Byron,
>
> byron writes:
>
> > Hi Loris - about a second
>
> What is the use-case for that? Are these individual jobs or it a job
> array. Either way it sounds to me li
Hi Loris - about a second
On Thu, Jul 28, 2022 at 2:47 PM Loris Bennett
wrote:
> Hi Byron,
>
> byron writes:
>
> > Hi
> >
> > We recently upgraded slurm from 19.05.7 to 20.11.9 and now we
> occasionally (3 times in 2 months) have slurmctld hanging so we get the
Hi
We recently upgraded slurm from 19.05.7 to 20.11.9 and now we occasionally
(3 times in 2 months) have slurmctld hanging so we get the following
message when running sinfo
“slurm_load_jobs error: Socket timed out on send/recv operation”
It only seems to happen when one of our users runs a job
at 17:22, Brian Andrus wrote:
>
>> Verify that their uid on the node is the same as the uid your master sees
>>
>> Brian Andrus
>>
>>
>> On 7/27/2022 8:53 AM, byron wrote:
>> > Hi
>> >
>> > When a user tries to login into a
Hi
When a user tries to login into a compute node on which they have a running
job they get the error
Access denied: user blahblah (uid=) has no active jobs on this node.
Authentication failed.
I recently upgraded slurm to 20.11.9 and was under the impression that
prior to the upgrade they w
On Mon, May 30, 2022 at 8:18 AM Ole Holm Nielsen
wrote:
> Hi Byron,
>
> Adding to Stephan's note, it's strongly recommended to make a database
> dry-run upgrade test before upgrading the production slurmdbd. Many
> details are in
> https://wiki.fysik.dtu.dk/niflheim/Sl
Hi
I'm currently doing an upgrade from 19.05 to 20.11.
All of our compute nodes have the same install of slurm NFS mounted. The
system has been setup so that all the start scripts and configuration files
point to the default installation which is a soft link to the most recent
installation of sl
As for old versions of slurm I think at this point you would need to
> contact SchedMD. I'm sure they have past releases they can hand out if you
> are bootstrapping to a newer release.
>
> -Paul Edmon-
> On 5/17/22 11:42 AM, byron wrote:
>
> Thanks Brian for the speedy
n (although they started at 18.x) with no
> issues. Running jobs were not impacted and users didn't even notice.
>
> Brian Andrus
>
>
> On 5/17/2022 7:35 AM, byron wrote:
> > Hi
> >
> > I'm looking at upgrading our install of slurm from 19.05 to 20.
Hi
I'm looking at upgrading our install of slurm from 19.05 to 20.11 in
responce to the recenty announced security vulnerabilities.
I've been through the documentation / forums and have managed to find the
answers to most of my questions but am still unclear about the following
- In upgrading t
I’m trying to replicate the setup of a new account where there is a new
“grouping” of accounts and a new account that will actually be used, so
something like this when you run
sacctmgr show assoc tree
mycluster account1. (which is just being used to group accounts
and so has no GrpTRE
Hi
I've been looking at using strigger for some simple cases such as when a
node drains or goes down. Most of the examples I've seen use the format
whereby it calls a script which reruns the strigger command for the next
event.
However there is also the "--flags=perm" approach, is there any
dis
Thanks for all the feedback, am going with Juergens MaxSubmitJobs approach.
On Thu, Oct 7, 2021 at 2:55 AM Chris Samuel wrote:
> On 6/10/21 6:21 am, byron wrote:
>
> > We have some accounts that we would like to suspend / freeze for the
> > time being that have unused hours as
We have some accounts that we would like to suspend / freeze for the time
being that have unused hours associated with them. Is there anyway of
doing this without removing the users associated with the accounts or
zeroing their hours?
We are using slurm version 19.05.7
Thanks
.
Thanks for your help.
On Wed, Sep 29, 2021 at 7:49 PM Paul Brunk wrote:
> Hello Byron:
>
>
>
> I’m guessing that your job is asking for more HW than the highmem_p
>
> has in it, or more cores or RAM within a node than any of the nodes
>
> have, or something like that.
the job that is stuck in state pending
JOBID PARTITION NAME USER ST TIME NODES
NODELIST(REASON)
10860160 highmem MooseBen byron PD 0:00 16
(PartitionConfig)
$ sinfo -p highmem
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
highmem up infinit
ay yes, else it doesn't make the change.
>
> Brian Andrus
>
> On 9/8/2021 3:41 AM, byron wrote:
> > Hi
> >
> > I've added a new account using sbank and have now discovered it should
> > have been added with the parent set. We've already accumulated a
Hi
I've added a new account using sbank and have now discovered it should have
been added with the parent set. We've already accumulated a couple of
months of user data so I dont just want to delete it and recreate it in the
correct location. I've had a read of the sacctmgr command and think I m
21 matches
Mail list logo