Loris Bennett <loris.benn...@fu-berlin.de> writes: > Hello David, > > David Baker <d.j.ba...@soton.ac.uk> writes: > >> Hello, >> >> I've taken a very good look at our cluster, however as yet not made >> any significant changes. The one change that I did make was to >> increase the "jobsizeweight". That's now our dominant parameter and it >> does ensure that our largest jobs (> 20 nodes) are making it to the >> top of the sprio listing which is what we want to see. >> >> These large jobs aren't making an progress despite the priority >> lift. I additionally decreased the nice value of the job that sparked >> this discussion. That is (looking at at sprio) there is a 32 node job >> with a very high priority... >> >> JOBID PARTITION USER PRIORITY AGE FAIRSHARE JOBSIZE >> PARTITION QOS NICE >> 280919 batch mep1c10 1275481 400000 59827 415655 >> 0 0 -400000 >> >> That job has been sitting in the queue for well over a week and it is >> disconcerting that we never see nodes becoming idle in order to >> service these large jobs. Nodes do become idle and then get scooped by >> jobs started by backfill. Looking at the slurmctld logs I see that the >> vast majority of jobs are being started via backfill -- including, for >> example, a 24 node job. I see very few jobs allocated by the >> scheduler. That is, messages like sched: Allocate JobId)6915 are few >> and far between and I never see any of the large jobs being allocated >> in the batch queue. >> >> Surely, this is not correct, however does anyone have any advice on >> what to check, please? > > Have you looked at what 'sprio' says? I usually want to see the list > sorted by priority and so call it like this: > > sprio -l -S "%Y"
This should be sprio -l -S "Y" [snip (242 lines)] -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de