As a side note: In Slurm 23.x a new rate limiting feature for client RPC calls was added: (see this commit: https://github.com/SchedMD/slurm/commit/674f118140e171d10c2501444a0040e1492f4eab#diff-b4e84d09d9b1d817a964fb78baba0a2ea6316bfc10c1405329a95ad0353ca33e ) This would give operators the ability to limit the negative effect of workflow managers on the scheduler.
On Mon, Feb 27, 2023 at 4:57 PM Davide DelVento <davide.quan...@gmail.com> wrote: > > > And if you are seeing a workflow management system causing trouble on > > > your system, probably the most sustainable way of getting this resolved > > > is to file issues or pull requests with the respective project, with > > > suggestions like the ones you made. For snakemake, a second good point > > > to currently chime in, would be the issue discussing Slurm job array > > > support: https://github.com/snakemake/snakemake/issues/301 > > > > I have to disagree here. I think the onus is on the people in a given > > community to ensure that their software behaves well on the systems they > > want to use, not on the operators of those system. Those of us running > > HPC systems often have to deal with a very large range of different > > pieces of software and time and personell are limited. If some program > > used by only a subset of the users is causing disruption, then it > > already costs us time and energy to mitigate those effects. Even if I > > had the appropriate skill set, I don't see my self be writing many > > patches for workflow managers any time soon. > > As someone who has worked in both roles (and to a degree still is) and > therefore can better understand the perspective from both parties, I > side more with David than with Loris here. > > Yes, David wrote "or pull requests", but that's an OR. > > Loris, if you know or experience a problem, it takes close to zero > time to file a bug report educating the author of the software about > the problem (or pointing them to places where they can educate > themselves). Otherwise they will never know about it, they will never > fix it, and potentially they think it's fine and will make the problem > worse. Yes, you could alternatively forbid the use of the problematic > software on the machine (I've done that on our systems), but users > with those needs will find ways to create the very same problem, and > perhaps worse, in other ways (they have done it on our system). Yes, > time is limited, and as operators of HPC systems we often don't have > the time to understand all the nuances and needs of all the users, but > that's not the point I am advocating. In fact it does seem to me that > David is putting the onus on himself and his community to make the > software behave correctly, and he is trying to educate himself about > what "correct" is like. So just give him the input he's looking for, > both here and (if and when snakemake causes troubles on your system) > by opening tickets on that repo, explaining the problem (definitely > not writing a PR for you, sorry David) > >