We had this issue recently. Some googling led me to the NERSC FAQs,
which state:
> _is_a_lwp is a function called internally for Slurm job accounting. The
> message indicates a rare error situation with a function call. But the error
> shouldn't affect anything in the user job. Please ignore t
This started working for me this morning. I have no idea why it started to
work. Maybe it was multiple restarts of the various daemons that did it.
-Original Message-
From: slurm-users On Behalf Of Brian W.
Johanson
Sent: Tuesday, February 4, 2020 1:35 PM
To: slurm-users@lists.sche
I am looking to test one with slurm.
I have compiled the code for ubuntu 18 ; however,
I am still working through the gres.conf , etc.
Thanks
Christopher J. Cawley
Systems Engineer/Linux Engineer, Information Technology Services
223 Aquia Building, Ffx, MSN: 1B5
George Mason University
Phon
We have a user that keeps encountering this error with one type of her jobs.
Sometimes her jobs will cancel and other times it will run fine.
slurmstepd: error: _is_a_lwp: open() /proc/195420/status failed: No such file
or directory
slurmstepd: error: *** JOB 17534 ON pe2dc5-0007 CANCELLED AT
2
Please include the output for:
scontrol show node=liqidos-dean-node1
scontrol show partition=Partition_you_are_attempting_to_submit_to
and
any other #SBATCH lines submitted with the failing job.
On 2/4/20 9:42 AM, dean.w.schu...@gmail.com wrote:
I've already restarted slurmctld and slurmd on a
Hello,
I've taken a very good look at our cluster, however as yet not made any
significant changes. The one change that I did make was to increase the
"jobsizeweight". That's now our dominant parameter and it does ensure that our
largest jobs (> 20 nodes) are making it to the top of the sprio l
I've already restarted slurmctld and slurmd on all nodes. Still get the same
problem.
-Original Message-
From: slurm-users On Behalf Of Marcus
Wagner
Sent: Tuesday, February 4, 2020 2:31 AM
To: slurm-users@lists.schedmd.com
Subject: Re: [slurm-users] sbatch script won't accept --gres t
-loesungen/hpc.html>
>> https://www.uni-potsdam.de/de/zim/angebote-loesungen/hpc.html
>>
>>
>>
>>
>>
>
> --
> Marcus Vincent Boden, M.Sc.
> Arbeitsgruppe eScience
> Tel.: +49 (0)551 201-2191
> E-Mail: mbo...@gwdg.de
> ---
> Gesellschaft fuer wissenschaftliche
> Datenverarbeitung mbH Goettingen (GWDG)
> Am Fassberg 11, 37077 Goettingen
> URL:http://www.gwdg.de
> E-Mail: g...@gwdg.de
> Tel.: +49 (0)551 201-1510
> Fax:+49 (0)551 201-2150
> Geschaeftsfuehrer: Prof. Dr. Ramin Yahyapour
> Aufsichtsratsvorsitzender:
> Prof. Dr. Christian Griesinger
> Sitz der Gesellschaft: Goettingen
> Registergericht: Goettingen
> Handelsregister-Nr. B 598
> ---
> -- next part --
> A non-text attachment was scrubbed...
> Name: smime.p7s
> Type: application/x-pkcs7-signature
> Size: 5028 bytes
> Desc: not available
> URL:
> <http://lists.schedmd.com/pipermail/slurm-users/attachments/20200204/f46bcd6
> 8/attachment-0001.bin>
>
> End of slurm-users Digest, Vol 28, Issue 8
> **
--
Dr. Loris Bennett (Mr.)
ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de
Yahyapour
Aufsichtsratsvorsitzender:
Prof. Dr. Christian Griesinger
Sitz der Gesellschaft: Goettingen
Registergericht: Goettingen
Handelsregister-Nr. B 598
---
-- next part --
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 5028 bytes
Desc: not available
URL:
<http://lists.schedmd.com/pipermail/slurm-users/attachments/20200204/f46bcd6
8/attachment-0001.bin>
End of slurm-users Digest, Vol 28, Issue 8
**
HI,
to your first question: I don't know the exact reason, but SchedMD made
it pretty clear, that there is a spcific sequence for updates:
slurmdbd -> slurmctld -> slurmd -> commands
See https://slurm.schedmd.com/SLUG19/Field_Notes_3.pdf (or any of the
other field notes) for details.
So, I'd advis
Hi David.
I'd love to hear back about the changes that you make and how they affect
the performance of your scheduler.
Any chance you could let us know how things go?
Killian
On Tue, 4 Feb 2020 at 10:43, David Baker wrote:
> Hello,
>
> Thank you very much again for your comments and the detai
Hello together,
on the RELEASE_NOTES I read the following:
Slurm can be upgraded from version 18.08 or 19.05 to version 20.02 without
loss
of jobs or other state information. Upgradi
Hello,
Thank you very much again for your comments and the details of your slurm
configuration. All the information is really useful. We are working on our
cluster right now and making some appropriate changes. We'll see how we get on
over the next 24 hours or so.
Best regards,
David
_
Hi! How can i debug a job that fail without any output or indication?
My job that start with sbatch has the following form :
#!/bin/bash
#SBATCH --job-name QCUT_SEV
#SBATCH -p CLUSTER # Partition to submit to
#SBATCH --output=%x_%j.out # File to which STDOUT will be writ
Hello together,
i have a small question about SLURM 20.02pre1 and Compiling,
If I compile this Version it´s look fine and i´ve got no error message back.
This is my command-line to configure:
./configure --with-munge=/usr/bin/
--with-pmix=/usr/local/openpmix:/usr/local/lib/openmpi
--enabl
Hi Dean,
could you please try to restart the slurmctld?
This usually helps on our site.
Never saw this with gres happening, but many other times.
This is, why we restart slurmctld once a day by a cron job.
Best
Marcus
On 2/4/20 12:59 AM, Dean Schulze wrote:
When I run an sbatch script with t
16 matches
Mail list logo