Hello all,
To solve a requirement where a large number of job arrays (~10k arrays, each with at most 8M elements) with same priority should be executed with minimal starvation of any array - we don't want to wait for each array to complete before starting the next one - we wish to implement "interleaving" between arrays, we came up with the following scheme:
Start all arrays in this partition in a "Hold" state. Release a predefined number of elements (E.g., 200) from this point a slurmctld prolog takes over: On the 200th job run squeue, note the next job array (array id following the currently executing array id) Release a predefined number of elements (E.g., 200) and repeat
This might produce a very large number of release requests to the scheduler in a short time frame, and one concern is the scheduler loop getting too many requests. Can you think of other issues that might come up with this
approach?
Do you have any recommendations, or might suggest a better approach to solve this problem?
We have considered fairshare, but all arrays are from same account and user. We have considered creating accounts on the fly (1 for each array) but get an error ("This should never happen") after creating a few thousand accounts. To my understanding fairshare is only viable between accounts. |
- [slurm-users] Can frequent hold-release adversely affect slur... Daniel Letai