Hello experts,

I hope someone is out there having some experience with the "ActiveFeatures" and "AvailableFeatures" in the node configuration and can give some advise.

We have configured 4 nodes with certain features, e.g.

"NodeName=thin1 Arch=x86_64 CoresPerSocket=24
   CPUAlloc=0 CPUTot=96 CPULoad=44.98
   AvailableFeatures=work,scratch
   ActiveFeatures=work,scratch

..."

The features are obviously filesystems mounted. Now we are going to take away one filesystem (work) for maintenance. Therefore we wanted to take away the feature from the nodes. We tried e.g.

# scontrol update node=thin1 ActiveFeatures="scratch"

resulting in

"NodeName=thin1 Arch=x86_64 CoresPerSocket=24
   CPUAlloc=0 CPUTot=96 CPULoad=44.98
   AvailableFeatures=work,scratch
   ActiveFeatures=scratch

..."

The problem now is that no jobs can be SUBMITTED requesting the feature work, the error we get is

"sbatch: error: Batch job submission failed: Requested node configuration is not available"


Does this make sense? We want our users to submit jobs requesting features that are available in general because maintenances usually don't last too long and the users want to submit jobs for the time once the feature is available again since we have rather long queuing times. I understand that jobs might be rejected when the feature is not available at all but not when it is not active?! Furthermore, also 4 node jobs get rejected at submission when the feature is only active on 3 nodes. Is this a bug? Wouldn't it make more sense that the job just sits in the queue waiting for the features/resources to be activated again?

Maybe someone has an idea how to handle this problem?

Thanks,

Alexander






Reply via email to