On Thu, 1 Apr 2021, Jean St-Laurent via NANOG wrote:
What happened is that it would create a kind of internal DDoS and they would
all timed out and give a weird error message. Something very useful like
Error Code 0x8098808 Please call our support line at this phone number.
If only there was a way to address the Thundering Herd problem before the
cloud. :)
This simple change to add 3 lines of code to add a random artificial boot
penalty of few seconds, completely solve the problem.
Bingo. Now, the trick is to catch this before it causes an self-DDoS.
This is a problem that has been recognised for decades and this is
unfortunately a good example of how operational experience is still not being
distributed properly. Too many managers think that operational work is obvious
and just a result of common sense. It isn't.
Cheers,
Rob