Hi Mark, Let me start by saying that I appreciate your well-reasoned response. (I think) I understand your reasoning, I appreciate your well-explained argument, and I respect your opinion. I just wanted to make that clear up front.
On Sun, Apr 22, 2018 at 1:11 PM, Mark Johnston <ma...@freebsd.org> wrote: > > > All too often, my ability to debug assertion violations is hindered because > > the system trips over yet another assertion while dumping the core. If we > > skip the assertion, nothing bad happens. (The post-panic debugging code > > already needs to deal with systems that are inconsistent, and it does a > > pretty good job at it.) > > I think we make a decent effort to fix such problems as they arise, but > you are declaring defeat on behalf of everyone. Did you make some effort > to fix or report these issues before resorting to the more drastic > measure taken here? We try to report or fix them as they arise. However, you don't know there is a problem until you actually run into it. And, you don't run into the problem until you can't get a core dump due to the assertion. (And, with elusive problems, it isn't always easy to duplicate them. So, fixing the assertion is sometimes "too late".) > > On the other hand, I really am not sure what you are worried might happen > > if we skip checking assertions after we've already panic'd. As far as I can > > tell, the likely worst case is that we hit a true panic of some kind. In > > that case, we're no worse off than before. > > > > I think the one obvious exception is when we're purposely trying to > > validate the post-panic debugging code. In that case, you can change the > > sysctl/tunable to enable troubleshooting. > > What about a user whose test system panics and fails to dump? With > assertions enabled, a developer has a better chance of spotting the > problem. Now we need at least one extra round trip to the user to > diagnose the problem, which may not be readily reproducible in the first > place. That's true. However, this is equally true in the other direction: Prior to this change, when a user tripped over an assertion and was unable to get a coredump due to a second post-panic assertion, it took (at least) another round-trip to get a coredump. First, without the capability to ignore assertions after a panic (introduced by this commit), you would need to fix the actual assertion to enable the user to get a coredump. At minimum, I think this change has value in that case. This change gives you a mechanism to get a coredump without requiring that you fix the assertion and get the user to recompile with the patch. But, moreover, if we change the default back to panic'ing on a second assertion, we will hamper our ability to get usable reports about elusive bugs. If we leave the default "as is", it won't take an extra round-trip to tell the user how to get a coredump. If we change the default (or, perhaps more correctly, "restore the prior default"), we will still need a second round-trip to get coredumps. That makes it tough to chase elusive bugs. > > I would honestly appreciate someone explaining the dangers in disabling a > > response to assertion violations after we've already panic'd and are simply > > trying to troubleshoot, because they are not obvious to me. But, I could > > simply be missing them. > > The assertions help identify code that is being executed during a dump > when it shouldn't be. In general we rely on users to opt in to running > INVARIANTS kernels because developers don't catch every single bug. With > this change it's harder to be confident in the kernel dump code. (Or in > any post-panic debugging code for that matter.) I can appreciate that. I am generally skeptical of the value of assertions in general-use code after a panic, since we already know the system is in an inconsistent/unexpected state. And, it is hard to predict all the various ways it could possibly be broken. However, I do recognize that there is code which specifically is written to run post-panic, and which has assertions which SHOULD be true, even after a panic. > I dislike the change and would prefer the default to be inverted. At the > very least I think we should print the assertion message rather than > returning silently from kassert_panic(). I still think this change has value (as described above). I can understand the argument for changing the default. In fact, after thinking about your email, I'm leaning towards doing that. But, I want to ponder it more today. If we leave the default alone, I agree we should print the assertion message (albeit with some rate limit). Jonathan _______________________________________________ svn-src-all@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/svn-src-all To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"