Re: [slurm-users] slurmstepd: error: _is_a_lwp

2020-02-04 Thread Marcus Boden
We had this issue recently. Some googling led me to the NERSC FAQs, which state: > _is_a_lwp is a function called internally for Slurm job accounting. The > message indicates a rare error situation with a function call. But the error > shouldn't affect anything in the user job. Please ignore t

Re: [slurm-users] sbatch script won't accept --gres that requires more than 1 gpu

2020-02-04 Thread dean.w.schulze
This started working for me this morning. I have no idea why it started to work. Maybe it was multiple restarts of the various daemons that did it. -Original Message- From: slurm-users On Behalf Of Brian W. Johanson Sent: Tuesday, February 4, 2020 1:35 PM To: slurm-users@lists.sche

[slurm-users] Anyone have success with Nvidia Jetson nano

2020-02-04 Thread Christopher J Cawley
I am looking to test one with slurm. I have compiled the code for ubuntu 18 ; however, I am still working through the gres.conf , etc. Thanks Christopher J. Cawley Systems Engineer/Linux Engineer, Information Technology Services 223 Aquia Building, Ffx, MSN: 1B5 George Mason University Phon

[slurm-users] slurmstepd: error: _is_a_lwp

2020-02-04 Thread Luis Huang
We have a user that keeps encountering this error with one type of her jobs. Sometimes her jobs will cancel and other times it will run fine. slurmstepd: error: _is_a_lwp: open() /proc/195420/status failed: No such file or directory slurmstepd: error: *** JOB 17534 ON pe2dc5-0007 CANCELLED AT 2

Re: [slurm-users] sbatch script won't accept --gres that requires more than 1 gpu

2020-02-04 Thread Brian W. Johanson
Please include the output for: scontrol show node=liqidos-dean-node1 scontrol show partition=Partition_you_are_attempting_to_submit_to and any other #SBATCH lines submitted with the failing job. On 2/4/20 9:42 AM, dean.w.schu...@gmail.com wrote: I've already restarted slurmctld and slurmd on a

Re: [slurm-users] Longer queuing times for larger jobs

2020-02-04 Thread David Baker
Hello, I've taken a very good look at our cluster, however as yet not made any significant changes. The one change that I did make was to increase the "jobsizeweight". That's now our dominant parameter and it does ensure that our largest jobs (> 20 nodes) are making it to the top of the sprio l

Re: [slurm-users] sbatch script won't accept --gres that requires more than 1 gpu

2020-02-04 Thread dean.w.schulze
I've already restarted slurmctld and slurmd on all nodes. Still get the same problem. -Original Message- From: slurm-users On Behalf Of Marcus Wagner Sent: Tuesday, February 4, 2020 2:31 AM To: slurm-users@lists.schedmd.com Subject: Re: [slurm-users] sbatch script won't accept --gres t

Re: [slurm-users] slurm-users Digest, Vol 28, Issue 8

2020-02-04 Thread Loris Bennett
-loesungen/hpc.html> >> https://www.uni-potsdam.de/de/zim/angebote-loesungen/hpc.html >> >> >> >> >> > > -- > Marcus Vincent Boden, M.Sc. > Arbeitsgruppe eScience > Tel.: +49 (0)551 201-2191 > E-Mail: mbo...@gwdg.de > --- > Gesellschaft fuer wissenschaftliche > Datenverarbeitung mbH Goettingen (GWDG) > Am Fassberg 11, 37077 Goettingen > URL:http://www.gwdg.de > E-Mail: g...@gwdg.de > Tel.: +49 (0)551 201-1510 > Fax:+49 (0)551 201-2150 > Geschaeftsfuehrer: Prof. Dr. Ramin Yahyapour > Aufsichtsratsvorsitzender: > Prof. Dr. Christian Griesinger > Sitz der Gesellschaft: Goettingen > Registergericht: Goettingen > Handelsregister-Nr. B 598 > --- > -- next part -- > A non-text attachment was scrubbed... > Name: smime.p7s > Type: application/x-pkcs7-signature > Size: 5028 bytes > Desc: not available > URL: > <http://lists.schedmd.com/pipermail/slurm-users/attachments/20200204/f46bcd6 > 8/attachment-0001.bin> > > End of slurm-users Digest, Vol 28, Issue 8 > ** -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de

Re: [slurm-users] slurm-users Digest, Vol 28, Issue 8

2020-02-04 Thread Matthias Krawutschke
Yahyapour Aufsichtsratsvorsitzender: Prof. Dr. Christian Griesinger Sitz der Gesellschaft: Goettingen Registergericht: Goettingen Handelsregister-Nr. B 598 --- -- next part -- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 5028 bytes Desc: not available URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20200204/f46bcd6 8/attachment-0001.bin> End of slurm-users Digest, Vol 28, Issue 8 **

Re: [slurm-users] Upgrade or /-date to Release 20.02p1 .....

2020-02-04 Thread Marcus Boden
HI, to your first question: I don't know the exact reason, but SchedMD made it pretty clear, that there is a spcific sequence for updates: slurmdbd -> slurmctld -> slurmd -> commands See https://slurm.schedmd.com/SLUG19/Field_Notes_3.pdf (or any of the other field notes) for details. So, I'd advis

Re: [slurm-users] Longer queuing times for larger jobs

2020-02-04 Thread Killian Murphy
Hi David. I'd love to hear back about the changes that you make and how they affect the performance of your scheduler. Any chance you could let us know how things go? Killian On Tue, 4 Feb 2020 at 10:43, David Baker wrote: > Hello, > > Thank you very much again for your comments and the detai

[slurm-users] Upgrade or /-date to Release 20.02p1 .....

2020-02-04 Thread Matthias Krawutschke
Hello together, on the RELEASE_NOTES I read the following: Slurm can be upgraded from version 18.08 or 19.05 to version 20.02 without loss of jobs or other state information. Upgradi

Re: [slurm-users] Longer queuing times for larger jobs

2020-02-04 Thread David Baker
Hello, Thank you very much again for your comments and the details of your slurm configuration. All the information is really useful. We are working on our cluster right now and making some appropriate changes. We'll see how we get on over the next 24 hours or so. Best regards, David _

[slurm-users] sbatch : job fail without any output or indication

2020-02-04 Thread Adrian Sevcenco
Hi! How can i debug a job that fail without any output or indication? My job that start with sbatch has the following form : #!/bin/bash #SBATCH --job-name QCUT_SEV #SBATCH -p CLUSTER # Partition to submit to #SBATCH --output=%x_%j.out # File to which STDOUT will be writ

[slurm-users] SLURM Release 20.02rc1 -> Problem with OpenMPI......

2020-02-04 Thread Matthias Krawutschke
Hello together, i have a small question about SLURM 20.02pre1 and Compiling, If I compile this Version it´s look fine and i´ve got no error message back. This is my command-line to configure: ./configure --with-munge=/usr/bin/ --with-pmix=/usr/local/openpmix:/usr/local/lib/openmpi --enabl

Re: [slurm-users] sbatch script won't accept --gres that requires more than 1 gpu

2020-02-04 Thread Marcus Wagner
Hi Dean, could you please try to restart the slurmctld? This usually helps on our site. Never saw this with gres happening, but many other times. This is, why we restart slurmctld once a day by a cron job. Best Marcus On 2/4/20 12:59 AM, Dean Schulze wrote: When I run an sbatch script with t