Hi, On Tue, May 14, 2024 at 3:38 AM Thomas Gleixner <t...@linutronix.de> wrote: > > On Tue, May 07 2024 at 04:34, Justin Stitt wrote: > > Using syzkaller alongside the newly reintroduced signed integer overflow > > sanitizer spits out this report: > > > > [ 138.454979] ------------[ cut here ]------------ > > [ 138.458089] UBSAN: signed-integer-overflow in ../kernel/time/ntp.c:461:16 > > [ 138.462134] 9223372036854775807 + 500 cannot be represented in type > > 'long' > > [ 138.466234] CPU: 0 PID: 0 Comm: swapper/0 Not tainted > > 6.8.0-rc2-00038-gc0a509640e93-dirty #10 > > [ 138.471498] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS > > 1.16.3-debian-1.16.3-2 04/01/2014 > > [ 138.477110] Call Trace: > > [ 138.478657] <IRQ> > > [ 138.479964] dump_stack_lvl+0x93/0xd0 > > [ 138.482276] handle_overflow+0x171/0x1b0 > > [ 138.484699] second_overflow+0x2d6/0x500 > > [ 138.487133] accumulate_nsecs_to_secs+0x60/0x160 > > [ 138.489931] timekeeping_advance+0x1fe/0x890 > > [ 138.492535] update_wall_time+0x10/0x30 > > Same comment vs. trimming.
Gotcha, in the next version this will be trimmed. > > > Historically, the signed integer overflow sanitizer did not work in the > > kernel due to its interaction with `-fwrapv` but this has since been > > changed [1] in the newest version of Clang. It was re-enabled in the > > kernel with Commit 557f8c582a9ba8ab ("ubsan: Reintroduce signed overflow > > sanitizer"). > > Again. Irrelevant to the problem. Right, I'll move it below the fold. > > > Let's introduce a new macro and use that against NTP_PHASE_LIMIT to > > properly limit the max size of time_maxerror without overflowing during > > the check itself. > > This fails to tell what is causing the issue and just talks about what > the patch is doing. The latter can be seen from the patch itself, no? > > Something like this: > > On second overflow time_maxerror is unconditionally incremented and > the result is checked against NTP_PHASE_LIMIT, but the increment can > overflow into negative space. > > Prevent this by checking the overflow condition before incrementing. > > Hmm? Sounds better :thumbs_up: I'll use this! > > But that obviously begs the question why this can happen at all. > > #define MAXPHASE 500000000L > #define NTP_PHASE_LIMIT ((MAXPHASE / NSEC_PER_USEC) << 5) > > ==> NTP_PHASE_LIMIT = 1.6e+07 = 0xf42400 > > #define MAXFREQ 500000 > > So how can 0xf42400 + 500000/1000 overflow in the first place? > > It can't unless time_maxerror is somehow initialized to a bogus > value and indeed it is: > > process_adjtimex_modes() > .... > if (txc->modes & ADJ_MAXERROR) > time_maxerror = txc->maxerror; > > So that wants to be fixed and not the symptom. Isn't this usually supplied from the user and can be some pretty random stuff? Are you suggesting we update timekeeping_validate_timex() to include a check to limit the maxerror field to (NTP_PHASE_LIMIT-(MAXFREQ / NSEC_PER_USEC))? It seems like we should handle the overflow case where it happens: in second_overflow(). The clear intent of the existing code was to saturate at NTP_PHASE_LIMIT, they just did it in a way where the check itself triggers overflow sanitizers. > > Thanks, > > tglx Thanks Justin