On Sat, Sep 16, 2023 at 10:36:52AM +0200, Martin Uecker wrote: > > On Fri, Sep 15, 2023 at 08:18:28AM -0700, Andrew Pinski wrote: > > > On Fri, Sep 15, 2023 at 8:12 AM Qing Zhao <qing.z...@oracle.com> wrote: > > > > > > > > > > > > > > > > > On Sep 15, 2023, at 3:43 AM, Xi Ruoyao <xry...@xry111.site> wrote: > > > > > > > > > > On Thu, 2023-09-14 at 21:41 +0000, Qing Zhao wrote: > > > > >>>> CLANG already provided -fsanitize=unsigned-integer-overflow. GCC > > > > >>>> might need to do the same. > > > > >>> > > > > >>> NO. There is no such thing as unsigned integer overflow. That option > > > > >>> is badly designed and the GCC community has rejected a few times now > > > > >>> having that sanitizer before. It is bad form to have a sanitizer for > > > > >>> well defined code. > > > > >> > > > > >> Even though unsigned integer overflow is well defined, it might be > > > > >> unintentional, shall we warn user about this? > > > > > > > > > > *Everything* could be unintentional and should be warned then. GCC > > > > > is a > > > > > compiler, not an advanced AI educating the programmers. > > > > > > > > Well, you are right in some sense. -:) > > > > > > > > However, overflow is one important source for security flaws, it’s > > > > important for compilers to detect > > > > overflows in the programs in general. > > > > > > Except it is NOT an overflow. Rather it is wrapping. That is a big > > > point here. unsigned wraps and does NOT overflow. Yes there is a major > > > difference. > > > > Right, yes. I will try to pick my language very carefully. :) > > > > The practical problem I am trying to solve in the 30 million lines of > > Linux kernel code is that of catching arithmetic wrap-around. The > > problem is one of evolving the code -- I can't just drop -fwrapv and > > -fwrapv-pointer because it's not possible to fix all the cases at once. > > (And we really don't want to reintroduce undefined behavior.) > > > > So, for signed, pointer, and unsigned types, we need: > > > > a) No arithmetic UB -- everything needs to have deterministic behavior. > > The current solution here is "-fno-strict-overflow", which eliminates > > the UB and makes sure everything wraps. > > > > b) A way to run-time warn/trap on overflow/underflow/wrap-around. This > > would work with -fsanitize=[signed-integer|pointer]-overflow except > > due to "a)" we always wrap. And there isn't currently coverage like > > this for unsigned (in GCC). > > > > Our problem is that the kernel is filled with a mix of places where there > > is intended wrap-around and unintended wrap-around. We can chip away at > > fixing the intended wrap-around that we can find with static analyzers, > > etc, but at the end of the day there is a long tail of finding the places > > where intended wrap-around is hiding. But when the refactoring is > > sufficiently completely, we can move the wrap-around warning to a trap, > > and the kernel will not longer have this class of security flaw. > > > > As a real-world example, here is a bug where a u8 wraps around causing > > an under-allocation that allowed for a heap overwrite: > > > > https://git.kernel.org/linus/6311071a0562 > > https://elixir.bootlin.com/linux/v6.5/source/net/wireless/nl80211.c#L5422 > > > > If there were more than 255 elements in a linked list, the allocation > > would be too small, and the second loop would write past the end of the > > allocation. This is a pretty classic allocation underflow and linear > > heap write overflow security flaw. (And it would be trivially stopped by > > trapping on the u8 wrap around.) > > > > So, I want to be able to catch that at run-time. But we also have code > > doing things like "if (ulong + offset < ulong) { ... }": > > > > https://elixir.bootlin.com/linux/v6.5/source/drivers/crypto/axis/artpec6_crypto.c#L1187 > > > > This is easy for a static analyzer to find and we can replace it with a > > non-wrapping test (e.g. __builtin_add_overflow()), but we'll not find > > them all immediately, especially for the signed and pointer cases. > > > > So, I need to retain the "everything wraps" behavior while still being > > able to detect when it happens. > > > Hi Kees, > > I have a couple of questions: > > Currently, my thinking was that you would use signed integers > if you want the usual integer arithmetic rules we know from > elementary school and if you overflow this is clearly a bug > you can diagnose with UBsan. > > There are people who think that signed overflow should be > defined to wrap, but I think this would be a severe > mistake because then code would start to rely on it, which > makes it then difficult to differentiate between bugs and > intended uses (e.g. the unfortunate situation you have > with the kernel).
Right -- my goal is to migrate the kernel codebase into using unambiguous arithmetic. Doing that evolution, though, is the hard part. :) At present, the kernel treats all signed and pointer arithmetic as wrapping, as that makes sure that there is no UB, which causes must more unexpected problems than universally wrapping does. > I assume you want to combine UBSan plus wrapping for > production use? Or only for testing? Or in other words: > why would testing UBSan and production with wrapping > not be sufficient to find and fix all bugs? I want UBSan to catch wrapping in production -- there is going to be a long tail of code that may be vulnerable to having it be manipulated into wrapping a calculation. The stuff that _intentionally_ wraps will stand out very quickly and we can fix those rapidly. (Many we can find today with static analyzers, but not all from what I've seen.) > Wrapping would not be correct because it may lead to > logic errors or use-after-free etc. I assume it is still > preferred because it more deterministic than whatever comes > out of the optimizer assuming that overflow has UB. Is this > the reasoning applied here? Exactly correct. > For unsigned the intended use case is modulo arithmetic > where wrapping is the correct behavior. At least, this > is what I thought so far.. This seems also to be the > position of the overall GCC community rejecting > -fsanitize=unsigned-integer-overflow. That is my understanding as well: unsigned wrap is "intended behavior". But for the Linux kernel this behavior is still "unexpected" in the majority of places where it may be reachable. > But then there are also people like Richard Seacord that > have the position that one should use "unsigned" for > every quantity which can not be negative, which implies > that then the modulo use case becomes the exception. Sure -- this is the preferred coding style, but we both have tons of old code that uses "int", and we still almost always would interpret wrapping as an unwanted state. > Related to this I have the following question: In the > bug you refer to above, an unsigned variable was used > for something that is not meant for modulo arithmetic. > Is this done in the kernel to save space? I.e. because > 127 would not be enough as maximum but going to i16 takes > to much space? or is this for compatibility with some > on-the-wire protocol? I don't know the original rationale -- I expect the author wanted an unsigned variable but didn't expect to ever encounter > 255 entries. These kinds of expectations aren't uncommon in the kernel, and sometimes past expectations get violated when refactoring, etc. Basically I can't trust the sanity of the codebase, so I have to depend on the compiler to provide the runtime coverage to avoid these kinds of security flaws. > Would an attribute for variables help that tells the > compiler that if stores to the variable wrap around > then this is not intended and this is an error? > > u8 x [[nowrap]]; > > x = 256; // can be diagnosed ? I would need the reverse: I want to assume all math to not wrap, excepting a handful that are designed to. -fsanitize=nowrap u8 modulo [[wrapping]]; u8 counter; modulo = 256; // no diagnose counter = 256; // diagnose -- Kees Cook