On Tue, Oct 4, 2016 at 7:43 PM, Paul Gortmaker <paul.gortma...@windriver.com> wrote: > > A couple years ago Ingo had an idea to kill BUG_ON abuse that I made > a 1st pass at. Back then it seemed nobody cared. Maybe that has since > changed? > > https://lkml.org/lkml/2014/4/30/359
So we actually have the checkpatch warning already: # avoid BUG() or BUG_ON() if ($line =~ /\b(?:BUG|BUG_ON)\b/) { my $msg_type = \&WARN; $msg_type = \&CHK if ($file); &{$msg_type}("AVOID_BUG", "Avoid crashing the kernel - try using WARN_ON & recovery code rather than BUG() or BUG_ON()\n" . $herecurr); } but it doesn't trigger on VM_BUG_ON(). And I'm not convinced about replacing things with BUG_ON_AND_HALT(), it simply doesn't fix the existing issue we have: people use BUG_ON(), and worse, _when_ they use BUG_ON(), they use it instead of error handling, so the code _around_ the BUG_ON() tends to then very much depend on what the BUG_ON() checks. This is actually one way that VM_BUG_ON() is better: it's very much by design something that can be compiled away, so at least hopefully nobody thinks of it as a security measure. So we could just say that we will treat VM_BUG_ON() as a WARN_ON_ONCE(), and just not kill the machine. Because I could easily imagine that somebody *does* treat BUG_ON() that way, thinking "well, if that BUG_ON() triggers at least it won't then go off the rails later". In fact, right now we mark BUG() in such a way that gcc can even take advantage of the "crash the machine" semantics, because we explicitly mark the BUG() with "unreachable()". So what I think we should think about is: - extending the checkpatch warning to VM_BUG_ON too, to discourage new users. - look at making BUG_ON() simply be less lethal. Remove the unrechable(), reorganize how the string is stored, and make it act more like WARN_ON_ONCE() instead (it's the "rewind_stack_do_exit()" that ends up causing us to try to kill things, we *could* just try to stop doing that). - Instead of adding a BUG_ON_AND_HALT(), we could perhaps add a new FATAL_ERROR() thing that acts like the current BUG_ON, and *not* call it something similar (we don't want people doing mindless conversions!). And that's the one that would do the whole rewind_stack_do_exit() to kill the process. But apart from the checkpatch thing, it's actually a pretty big change. Linus