On Tue 07-07-20 07:43:48, Qian Cai wrote: > > > > On Jul 7, 2020, at 6:28 AM, Michal Hocko <mho...@kernel.org> wrote: > > > > Would you have any examples? Because I find this highly unlikely. > > OVERCOMMIT_NEVER only works when virtual memory is not largerly > > overcommited wrt to real memory demand. And that tends to be more of > > an exception rather than a rule. "Modern" userspace (whatever that > > means) tends to be really hungry with virtual memory which is only used > > very sparsely. > > > > I would argue that either somebody is running an "OVERCOMMIT_NEVER" > > friendly SW and this is a permanent setting or this is not used at all. > > At least this is my experience. > > > > So I strongly suspect that LTP test failure is not something we should > > really lose sleep over. It would be nice to find a way to flush existing > > batches but I would rather see a real workload that would suffer from > > this imprecision. > > I hear you many times that you really don’t care about those use > cases unless you hear exactly people are using in your world. > > For example, when you said LTP oom tests are totally artificial last > time and how less you care about if they are failing, and I could only > enjoy their efficiencies to find many issues like race conditions > and bad error accumulation handling etc that your “real world use > cases” are going to take ages or no way to flag them.
Yes, they are effective at hitting corner cases and that is fine. I am not dismissing their usefulness. I have tried to explain that many times but let me try again. Seeing a corner case and think about a potential fix is one thing. On the other hand it is not really ideal to treat such a failure a hard regression and consider otherwise useful functionality/improvement to be reverted without a proper cost benefit analysis. Sure having corner cases is not really nice but really, look at this example again. Overcommit setting is a global thing, it is hard to change it during runtime nilly willy. Because that might have really detrimental side effects on all workloads running. So it is quite reasonable to expect that this is either early after the boot or when the system is in quiescent state when almost nothing but very core services are running and likelihood that the mode of operation changes. > There are just too many valid use cases in this wild world. The > difference is that I admit that I don’t know or even aware all the > use cases, and I don’t believe you do as well. Me neither and I am not claiming that. All I am saying is that a real risk of a regression is reasonably low that I wouldn't lose sleep over that. It is perfectly fine to address this pro-actively if the fix is reasonably maintainable. I was mostly reacting to your pushing for a revert solely based on LTP results. LTP is a very useful tool to raise awareness of potential problems but you shouldn't really follow those results just blindly. > If a patchset broke the existing behaviors that written exactly in > the spec, it is then someone has to prove its innocent. For example, > if nobody is going to rely on something like this now and future, and > then fix the spec and explain exactly nobody should be rely upon. I am all for clarifications in the documentation. -- Michal Hocko SUSE Labs