On 6/2/22 14:02, tluchko wrote:
Hello,
I have recently started to have problems where jobs sit in the queue
waiting for resources to become available, even when the resources are
available. If I stop and restart slurmctld, the pending jobs start running.
This seems to be related to GRES jobs
tluchko writes:
> Jobs only sit in the queue with RESOURCES as the REASON when we
> include the flag --gres=bandwidth:ib. If we remove the flag, the jobs
> run fine. But we need the flag to ensure that we don't get a mix of IB
> and ethernet nodes because they fail in this case.
This doesn't ans