Hi David, (Thanks for changing the subject to something more appropriate).
David Laehnemann <david.laehnem...@hhu.de> writes: > Yes, but only to an extent. The linked conversation ends with this: > >>> Do you have any best practice about setting MaxJobCount to a proper > number? > >> That depends upon your workload. You could probably set MaxJobCount > to at least 50000 with most systems (assuming you have at least a few > gigabytes of memory). Some sites run with a value of 1000000 or more. > > So, it is configurable. But this has a limit. And if you have lots of > users on a system submitting lots of jobs, even a value of 1000000 can > get exhausted. Yes, but start a lot more jobs and stay within the limit if you use jobs arrays. When you submit individual jobs, a job ID for each one needs to be written to the Slurm job database. This can cause the database to become unresponsive if the number submitted at one time, whether by snakemake or just a bash script looping over 'sbatch', is too high. If, on the other hand, you submit a job array, only one entry needs to be made in the database immediately, with entries for the elements of the array only being made when a job can actually start. This is why a large number of individual jobs with the same resource requirements prevents backfill from working properly. The mechanism only considers a certain (configurable) number of pending jobs to see whether they qualify for backfilling. In this context, a job array is counted as a single job, regardless of how large the array actually is. This will degrade the throughput of the system and thus negatively impact all users. Therefore, on our system we would not allow users to employ an mechanism which generates a large number jobs but does not employ job arrays. > And in either case, this is not something that speaks against a > workflow management system giving you additional control over things. > So I'm not sure what exactly we are arguing about, right here... I just wanted to point out that, whereas for some user approaches such as snakemake obviously scratch a very important itch, for people running HPC systems, and indeed for users who don't use such mechanisms, they may cause issues. Cheers, Loris > cheers, > david > > > > On Thu, 2023-02-23 at 17:41 +0100, Ole Holm Nielsen wrote: >> On 2/23/23 17:07, David Laehnemann wrote: >> > In addition, there are very clear limits to how many jobs slurm can >> > handle in its queue, see for example this discussion: >> > https://bugs.schedmd.com/show_bug.cgi?id=2366 >> >> My 2 cents: Slurm's job limits are configurable, see this Wiki page: >> https://wiki.fysik.dtu.dk/Niflheim_system/Slurm_configuration/#maxjobcount-limit >> >> /Ole >> -- Dr. Loris Bennett (Herr/Mr) ZEDAT, Freie Universität Berlin