Thanks Paul for taking the time to further look into this. In fact you are correct and adding a default mode (which is then overridden by each partition setting) keeps slurm happy with that configuration. Moreover (after restarting daemons, etc per the documentation) everything seems to be working as I intended. I obviously need to do a few more tests, especially for edge cases, but adding that default seems to have completely fixed the problem.
Thanks again and have a great weekend! On Fri, Jan 12, 2024 at 8:49 AM Paul Edmon <ped...@cfa.harvard.edu> wrote: > My concern was you config inadvertantly having that line commented out and > then seeing problems. If it wasn't then no worries at this point. > > We run using preempt/partition_prio on our cluster and have a mix of > partitions using PreemptMode=OFF and PreemptMode=REQUEUE. So I know that > combination works. I would be surprised if PreemptMode=CANCEL did not work > as that's a valid option. > > Something we do have set though is what the default mode is. We have set: > > ### Govern's default preemption behavior > PreemptType=preempt/partition_prio > PreemptMode=REQUEUE > > So you might try setting that default of PreemptMode=CANCEL and then set > specific PreemptModes for all your partitions. That's what we do and it > works for us. > > -Paul Edmon- > On 1/12/2024 10:33 AM, Davide DelVento wrote: > > Thanks Paul, > > I don't understand what you mean by having a typo somewhere. I mean, that > configuration works just fine right now, whereas if I add the commented out > line any slurm command will just abort with the error "PreemptType and > PreemptMode values incompatible". So, assuming there is a typo, it should > be in the commented line right? Or are you saying that having that line > makes slurm sensitive to a typo somewhere else that would be otherwise > ignored? Obviously I can't exclude that option, but it seems unlikely to > me. Also because it does say these two things are incompatible. > > It would obviously much better if the error would say what EXACTLY is > incompatible with what, but the documentation at > https://slurm.schedmd.com/preempt.html I see many clues of what that > could be, and hence I am asking people here who may have deployed > preemption already on their system. Some excerpts from that URL: > > > *PreemptType*: Specifies the plugin used to identify which jobs can be > preempted in order to start a pending job. > > - *preempt/none*: Job preemption is disabled (default). > - *preempt/partition_prio*: Job preemption is based upon partition > *PriorityTier*. Jobs in higher PriorityTier partitions may preempt > jobs from lower PriorityTier partitions. This is not compatible with > *PreemptMode=OFF*. > > > which somewhat make it sounds like all partitions should have preemption > set and not only some? I obviously have some "off" partitions. However > elsewhere in that document it says > > *PreemptMode*: Mechanism used to preempt jobs or enable gang scheduling. > When the *PreemptType* parameter is set to enable preemption, the > *PreemptMode* in the main section of slurm.conf selects the default > mechanism used to preempt the preemptable jobs for the cluster. > *PreemptMode* may be specified on a per partition basis to override this > default value if *PreemptType=preempt/partition_prio*. > > which kind of sounds like it should be okay (unless it means > **everything** must be different than OFF). Yet still elsewhere in that > same page it says > > On the other hand, if you want to use *PreemptType=preempt/partition_prio* to > allow jobs from higher PriorityTier partitions to Suspend jobs from lower > PriorityTier partitions, then you will need overlapping partitions, and > *PreemptMode=SUSPEND,GANG* to use Gang scheduler to resume the suspended > job(s). In either case, time-slicing won't happen between jobs on different > partitions. > > Which somewhat sounds like only suspend and gang can be used as preemption > modes, and not cancel (my preference) or requeue (perhaps acceptable, if I > jump through some hoops). > > So to me the documentation is highly confusing about what can or cannot be > used together with what else, and the examples at the bottom of the page > are nice, but they do not specify the full settings. Particularly this one > https://slurm.schedmd.com/preempt.html#example2 is close enough to mine, > but it does not tell what PreemptType has been chosen (nor if "cancel" > would be allowed or not in that setup). > > Thanks again! > > On Fri, Jan 12, 2024 at 7:22 AM Paul Edmon <ped...@cfa.harvard.edu> wrote: > >> At least in the example you are showing you have PreemptType commented >> out, which means it will return the default. PreemptMode Cancel should >> work, I don't see anything in the documentation that indicates it >> wouldn't. So I suspect you have a typo somewhere in your conf. >> >> -Paul Edmon- >> On 1/11/2024 6:01 PM, Davide DelVento wrote: >> >> I would like to add a preemptable queue to our cluster. Actually I >> already have. We simply want jobs submitted to that queue be preempted if >> there are no resources available for jobs in other (high priority) queues. >> Conceptually very simple, no conditionals, no choices, just what I wrote. >> However it does not work as desired. >> >> This is the relevant part: >> >> grep -i Preemp /opt/slurm/slurm.conf >> #PreemptType = preempt/partition_prio >> PartitionName=regular DefMemPerCPU=4580 Default=True Nodes=node[01-12] >> State=UP PreemptMode=off PriorityTier=200 >> PartitionName=All DefMemPerCPU=4580 Nodes=node[01-36] State=UP >> PreemptMode=off PriorityTier=500 >> PartitionName=lowpriority DefMemPerCPU=4580 Nodes=node[01-36] State=UP >> PreemptMode=cancel PriorityTier=100 >> >> >> That PreemptType setting (now commented) fully breaks slurm, everything >> refuses to run with errors like >> >> $ squeue >> squeue: error: PreemptType and PreemptMode values incompatible >> squeue: fatal: Unable to process configuration file >> >> If I understand correctly the documentation at >> https://slurm.schedmd.com/preempt.html that is because preemption cannot >> cancel jobs based on partition priority, which (if true) is really >> unfortunate. I understand that allowing cross-partition time-slicing could >> be tricky and so I understand why that isn't allowed, but cancelling? >> Anyway, I have to questions: >> >> 1) is that correct and so should I avoid using either partition priority >> or cancelling? >> 2) is there an easy way to trick slurm into requeing and then have those >> jobs cancelled instead? >> 3) I guess the cleanest option would be to implement QoS, but I've never >> done it and we don't really need it for anything else other than this. The >> documentation looks complicated, but is it? The great Ole's website is >> unavailable at the moment... >> >> Thanks!! >> >>