Re: [slurm-users] (no subject)

2021-07-30 Thread Chris Samuel
On Friday, 30 July 2021 11:21:19 AM PDT Soichi Hayashi wrote: > I am running slurm-wlm 17.11.2 You are on a truly ancient version of Slurm there I'm afraid (there have been 4 major releases & over 13,000 commits since that was tagged in January 2018), I would strongly recommend you try and get

Re: [slurm-users] [External] Re: Down nodes

2021-07-30 Thread Soichi Hayashi
Brian, Yes, slurmd is not running on that node because the node itself is not there anymore (the whole VM is gone!). When the node is no longer in use, slurm automatically runs slurm_suspend.sh script which removes the whole node(VM) by running "openstack server delete $host". There is no server/V

Re: [slurm-users] Down nodes

2021-07-30 Thread Brian Andrus
That 'not responding' is the issue and usually means 1 of 2 things: 1) slurmd is not running on the node 2) something on the network is stopping the communication between the node and the master (firewall, selinux, congestion, bad nic, routes, etc) Brian Andrus On 7/30/2021 3:51 PM, Soichi Ha

Re: [slurm-users] Down nodes

2021-07-30 Thread Soichi Hayashi
Brian, Thank you for your reply and thanks for setting the email title. I forgot to edit it before I sent it! I am not sure how I can reply to your your reply.. but I hope this make it so the right place.. I've updated slurm.conf to increase the controller debug level > SlurmctldDebug=5 I now s

Re: [slurm-users] History of pending jobs

2021-07-30 Thread Ole Holm Nielsen
On 30-07-2021 20:42, Glenn (Gedaliah) Wolosh wrote: I'm interested on getting an idea how long jobs were pending in a particular partition. Is there any magic to sreport or sacct that can generate this info. I could also use something like:"sreport cluster utilization" broken down by partitio

Re: [slurm-users] History of pending jobs

2021-07-30 Thread Fulcomer, Samuel
XDMoD can do that for you, but bear in mind that wait/pending time by itself may not be particularly useful. Consider the extreme scenario in which a user is only allowed to use one node at a time, but submits a thousand one-day jobs. Without any other competition for resources, the average wait/p

[slurm-users] History of pending jobs

2021-07-30 Thread Glenn (Gedaliah) Wolosh
I'm interested on getting an idea how long jobs were pending in a particular partition. Is there any magic to sreport or sacct that can generate this info. I could also use something like:"sreport cluster utilization" broken down by partition. Any help would be appreciated.

[slurm-users] (no subject)

2021-07-30 Thread Soichi Hayashi
Hello. I need a help with troubleshooting our slurm cluster. I am running slurm-wlm 17.11.2 on Ubuntu 20 on a public cloud infrastructure (Jetstream) using an elastic computing mechanism ( https://slurm.schedmd.com/elastic_computing.html). Our cluster works for the most part, but for some reason,