7;t running, despite "idle"
resources, and maintenance of the topology.conf file.
John DeSantis
On Wed, 20 Jun 2018 12:16:59 -0400
Paul Edmon wrote:
> You will get whatever cores Slurm can find which will be an
> assortment of hosts.
>
> -Paul Edmon-
>
>
> On 6/20/201
is to use
`salloc` first, despite version 17.11.9 not needing `salloc` for an
"interactive" sessions.
Before we go further down this rabbit hole, were other sites affected with a
transition from SLURM versions 16.x,17.x,18.x(?) to versions 20.x? If so, did
the methodology for multinode interactive MPI sessions change?
Thanks!
John DeSantis
20.02.x, but it seems that this
behaviour still exists. Are no other sites on fresh installs of >= SLURM
20.11.3 experiencing this problem?
I was aware of the changes in 20.11.{0..2} which received a lot of scrunity,
which is why 20.11.3 was selected.
Thanks,
John DeSantis
On 4/26/21 5:12
c-1057-30-2
> mdc-1057-30-6
Thanks for that suggestion!
I imagine that this could be a bug then, that is specifying "--overlap" with
`srun` having no effect while manually setting the variable does.
John DeSantis
On 4/28/21 11:27 AM, Juergen Salk wrote:
> Hi John,
>
>
that a
job was preempted in the application's output, or within the slurmctld logs.
When we switched to PreemptExemptTime, all application output and SLURM logs
stated preempted as the reason.
I know you want to suspend preempted jobs, but what happens if you cancel them
instead?
HTH,
Jo
one else seen this?
Thank you,
John DeSantis
OpenPGP_signature
Description: OpenPGP digital signature
JobAcctGatherParams=OverMemoryKill in our environment to monitor and kill jobs
when the physical memory limit has been exceeded.
Thank you,
John DeSantis
On 5/18/22 09:45, John DeSantis wrote:
Hello,
Due to the recent CVE posted by Tim, we did upgrade from SLURM 20.11.3 to
20.11.9.
Today, I receiv
ately, the
problem remained. Long story short, I upgraded to the latest stable version
this morning and the issue appears resolved.
Thanks!
John DeSantis
On 5/19/22 05:41, Luke Sudbery wrote:
We ran into a similar issue a while ago (not sure what versions were involved
though). Can't guarante
uxproc; sadly this
is an artifact from our testing days - and was never changed after our
move to production!
RTFM, dude!
John DeSantis
On Tue, 5 Sep 2017 11:40:15 -0600
John DeSantis wrote:
>
> Hello all,
>
> We were recently alerted by a user whose long running jobs (>= 6
Colas,
We had a similar experience a long time ago, and we solved it by adding
the following SchedulerParameters:
max_rpc_cnt=150,defer
HTH,
John DeSantis
On Thu, 11 Jan 2018 16:39:43 -0500
Colas Rivière wrote:
> Hello,
>
> I'm managing a small cluster (one head node, 24 worke
g
pertaining to the server thread count being over its limit.
HTH,
John DeSantis
On Fri, 12 Jan 2018 11:32:57 +0100
Alessandro Federico wrote:
> Hi all,
>
>
> we are setting up SLURM 17.11.2 on a small test cluster of about 100
> nodes. Sometimes we get the error in the subj
d defining a MaxWall via each QOS
(since one partition has 04:00:00 and the other 03:00:00).
The same could be done for the partitions skl_fua_{prod,bprod,lprod} as
well.
HTH,
John DeSantis
On Tue, 16 Jan 2018 11:22:44 +0100
Alessandro Federico wrote:
> Hi,
>
> setting MessageTimeout
Matthieu,
> I would bet on something like LDAP requests taking too much time
> because of a missing sssd cache.
Good point! It's easy to forget to check something as "simple" as user
look-up when something is taking "too long".
John DeSantis
On Tue, 16 J
ross the 6K
> nodes.
Ok, that makes sense. Looking initially at your partition definitions,
I immediately thought of being DRY, especially since the "finer" tuning
between the partitions could easily be controlled via the QOS' allowed
to access the resources.
John DeSantis
O
because their memory usage spiked during a JobAcctGatherFrequency
sampling interval (every 30 seconds, adjusted within slurm.conf).
John DeSantis
On Wed, 14 Feb 2018 13:05:41 +0100
Loris Bennett wrote:
> Geert Kapteijns writes:
>
> > Hi everyone,
> >
> > I’m running int
e avoid this hassle by ensuring that a user has a default qos, e.g.
`sacctmgr add user blah defaultqos=blah fairshare=blah`
HTH,
John DeSantis
On Sat, 7 Apr 2018 16:32:40 +
Dmitri Chebotarov wrote:
> The MaxSubmitJobsPerUser seems to be working when QOS where
> MaxSubmitJobsPerUser is defin
I'm using slurm 16.05.10-2 and slurmdbd 16.05.10-2.
Thanks,
John DeSantis
-BEGIN PGP SIGNATURE-
iQEzBAEBCgAdFiEEbVacPSiwOGJ0Y8jASZyQGquzmcEFAlrqDKUACgkQSZyQGquz
mcHbdggAlBkA9K+97HmDoEZYdbAvN370oFUbrtjnwF5vcbk/tLm5zcnv4xkAoL6H
mZlNvWvsapjjztlq4hZ6vAvZ1OnlM++5G0XJ66BEAUmEf
I'm using slurm 16.05.10-2 and slurmdbd 16.05.10-2.
Thanks,
John DeSantis
18 matches
Mail list logo