Excerpts from Christophe Leroy's message of February 27, 2021 8:07 pm: > > > Le 25/02/2020 à 18:35, Nicholas Piggin a écrit : >> Implement the bulk of interrupt return logic in C. The asm return code >> must handle a few cases: restoring full GPRs, and emulating stack store. >> >> The stack store emulation is significantly simplfied, rather than creating >> a new return frame and switching to that before performing the store, it >> uses the PACA to keep a scratch register around to perform thestore. >> >> The asm return code is moved into 64e for now. The new logic has made >> allowance for 64e, but I don't have a full environment that works well >> to test it, and even booting in emulated qemu is not great for stress >> testing. 64e shouldn't be too far off working with this, given a bit >> more testing and auditing of the logic. >> >> This is slightly faster on a POWER9 (page fault speed increases about >> 1.1%), probably due to reduced mtmsrd. > > > This series, and especially this patch has added a awfull number of BUG_ON() > traps. > > We have an issue open at https://github.com/linuxppc/issues/issues/88 since > 2017 for reducing the > number of BUG_ON()s > > And the kernel Documentation is explicit on the willingness to deprecate > BUG_ON(), see > https://www.kernel.org/doc/html/latest/process/deprecated.html?highlight=bug_on > : > > BUG() and BUG_ON() > Use WARN() and WARN_ON() instead, and handle the “impossible” error condition > as gracefully as > possible. While the BUG()-family of APIs were originally designed to act as > an “impossible > situation” assert and to kill a kernel thread “safely”, they turn out to just > be too risky. (e.g. > “In what order do locks need to be released? Have various states been > restored?”) Very commonly, > using BUG() will destabilize a system or entirely break it, which makes it > impossible to debug or > even get viable crash reports. Linus has very strong feelings about this. > > So ... can we do something cleaner with all the BUG_ON()s recently added ?
Yeah you're right. Some of it is probably overkill due to paranoia when developing the series. Now we have a bit more confidence we could probably look at cutting down on these. I do get a bit concerned about detecting a problem in some code like this and attempting to just continue, it usually means the system is going to crash pretty badly anyway (and the WARN_ON trap interrupt is probably going to finish you off anyway). So I think removing the more obvious checks entirely (maybe with a PPC DEBUG config option) is the right way to go. Thanks, Nick