On 7/7/20 5:57 pm, Jason Simms wrote:
Failed to look up user weissp: No such process
That looks like the user isn't known to the node. What do these say:
id weissp
getent passwd weissp
Which version of Slurm is this?
All the best,
Chris
--
Chris Samuel : http://www.csamuel.org/ : Ber
Now that is interesting. If I do:
loginctl enable-linger weissp
Then I get the following error:
Failed to look up user weissp: No such process
This is one of the users that always fails. But if I run it for myself with:
loginctl enable-linger simmsj
Everything works (as expected).
Any though
Hi Jason,
What happens when you try to run that command on the node? Is the exit
status of the command 0?
e.g. for my servers, where lingering is masked, I get
[root@thespian-gpgpu001 ~]# loginctl enable-linger scrosby
Could not enable linger: Unit is masked.
[root@thespian-gpgpu001 ~]# echo $?
On Wed, 8 Jul 2020 at 00:47, zaxs84 wrote:
> *UoM notice: External email. Be cautious of links, attachments, or
> impersonation attempts.*
> --
> Hi Sean,
> thank you very much for your reply.
>
> > If a lower priority job can start AND finish before the resources a
>
Hi,
We use Job QOS and Resource Reservations for this purpose. QOS is a good
option for a "permanent" change to a user's resource limits. We use
reservations similar to how you're currently using partitions to "temporarily"
provide a resource boost without the complexities of re-partitioning
Hello,
We have a slurm system with partitions set for max runtime of 24hours. What
would be the proper way to allow a certain set of users to run jobs on the
current partitions beyond the partition limits? In the past we would isolate
some nodes based on their job requirements , make a new pa
Hi all.
Is there a scheduler option that allows low priority jobs to be immediately
paused (or even stopped) when jobs with higher priority are submitted?
Related to this question, I am also a bit confused about how "scontrol
suspend" works; my understanding is that a job that gets suspended rece
Hello all,
Two users on my system experience job failures every time they submit a job
via sbatch. When I run their exact submission script, or when I create a
local system user and launch from there, the jobs run fine. Here is an
example of what I see in the slurmd log:
[2020-07-06T15:02:41.284]
Hi Sean,
thank you very much for your reply.
> If a lower priority job can start AND finish before the resources a
higher priority job requires are available, the backfill scheduler will
start the lower priority job.
That's very interesting, but how can the scheduler predict how long a
low-priori
Hi,
What you have described is how the backfill scheduler works. If a lower
priority job can start AND finish before the resources a higher priority
job requires are available, the backfill scheduler will start the lower
priority job.
Your high priority job requires 24 cores, whereas the lower pr
Hi all.
We want to achieve a simple thing with slurm: launch "normal" jobs, and be
able to launch "high priority" jobs that run as soon as possible. End of
it. However we cannot achieve this in a reliable way, meaning that our
current config sometimes works, sometimes not, and this is driving us c
11 matches
Mail list logo