Re: [RFC] Mitigating unexpected arithmetic overflow
Am 5/8/2024 um 10:07 PM schrieb Linus Torvalds: And no, the answer is ABSOLUTELY NOT to add cognitive load on kernel developers by adding yet more random helper types and/or functions. Just to show an option without "more types and helper functions", one could also instead add a coverage requirement: Every arithmetic operation should either: - have a test case where the wrap around happens, or - have a static analyser say that overflow can not happen, or - have a static analyser say that overflow is fine (e.g., your a+b < a case) Then the answer to safe wrap situations isn't to make the kernel code less readable, but to have a component-level test that shows that the behavior on overflow (in at least one case :)) ) is what the developer expected. For static analysis to prove that overflow can not happen, one sometimes would need to add BUG_ON() assertions to let the analyser know the assumptions on surrounding code, which has its own benefits. static inline u32 __item_offset(u32 val) { BUG_ON(val > INT_MAX / ITEM_SIZE_PER_UNIT); return val * ITEM_SIZE_PER_UNIT; } Obviously, the effort involved is still high. Maybe if someone as a pet project proves first that something in this direction is actually worth the effort (by uncovering a heap of bugs), one could offer this kind of check as an opt-in. Best wishes, jonas
Re: [PATCH] ntp: remove accidental integer wrap-around
On Thu, May 16 2024 at 16:40, Justin Stitt wrote: > On Tue, May 14, 2024 at 3:38 AM Thomas Gleixner wrote: >> So how can 0xf42400 + 50/1000 overflow in the first place? >> >> It can't unless time_maxerror is somehow initialized to a bogus >> value and indeed it is: >> >> process_adjtimex_modes() >> >> if (txc->modes & ADJ_MAXERROR) >> time_maxerror = txc->maxerror; >> >> So that wants to be fixed and not the symptom. > > Isn't this usually supplied from the user and can be some pretty > random stuff? Sure it comes from user space and can contain random nonsense as syzkaller demonstrated. > Are you suggesting we update timekeeping_validate_timex() to include a > check to limit the maxerror field to (NTP_PHASE_LIMIT-(MAXFREQ / > NSEC_PER_USEC))? It seems like we should handle the overflow case > where it happens: in second_overflow(). > > The clear intent of the existing code was to saturate at > NTP_PHASE_LIMIT, they just did it in a way where the check itself > triggers overflow sanitizers. The clear intent of the code is to do saturation of a bound value. Clearly the user space interface fails to validate the input to be in a sane range and that makes you go and prevent the resulting overflow at the usage site. Seriously? Obviously the sanitizer detects the stupid in second_overflow(), but that does not mean that the proper solution is to add overflow protection to that code. Tools are good to pin-point symptoms, but they are by definition patently bad in root cause analysis. Otherwise we could just let the tool write the "fix". The obvious first question in such a case is to ask _WHY_ does time_maxerror have a bogus value, which clearly cannot be achieved from regular operation. Once you figured out that the only way to set time_maxerror to a bogus value is the user space interface the obvious follow up question is whether such a value has to be considered as valid or not. As it is obviously invalid the logical consequence is to add a sanity check and reject that nonsense at that boundary, no? Thanks, tglx
Re: [PATCH 05/12] dmaengine: Add STM32 DMA3 support
On 5/16/24 19:09, Frank Li wrote: On Thu, May 16, 2024 at 05:25:58PM +0200, Amelie Delaunay wrote: On 5/15/24 20:56, Frank Li wrote: On Tue, Apr 23, 2024 at 02:32:55PM +0200, Amelie Delaunay wrote: STM32 DMA3 driver supports the 3 hardware configurations of the STM32 DMA3 controller: ... + writel_relaxed(hwdesc->cdar, ddata->base + STM32_DMA3_CDAR(id)); + writel_relaxed(hwdesc->cllr, ddata->base + STM32_DMA3_CLLR(id)); + + /* Clear any pending interrupts */ + csr = readl_relaxed(ddata->base + STM32_DMA3_CSR(id)); + if (csr & CSR_ALL_F) + writel_relaxed(csr, ddata->base + STM32_DMA3_CFCR(id)); + + stm32_dma3_chan_dump_reg(chan); + + ccr = readl_relaxed(ddata->base + STM32_DMA3_CCR(id)); + writel_relaxed(ccr | CCR_EN, ddata->base + STM32_DMA3_CCR(id)); This one should use writel instead of writel_relaxed because it need dma_wmb() as barrier for preious write complete. Frank ddata->base is Device memory type thanks to ioremap() use, so it is strongly ordered and non-cacheable. DMA3 is outside CPU cluster, its registers are accessible through AHB bus. dma_wmb() (in case of writel instead of writel_relaxed) is useless in that case: it won't ensure the propagation on the bus is complete, and it will have impacts on the system. That's why CCR register is written once, then it is read before CCR_EN is set and being written again, with _relaxed(), because registers are behind a bus, and ioremapped with Device memory type which ensures it is strongly ordered and non-cacheable. regardless memory map, writel_relaxed() just make sure io write and read is orderred, not necessary order with other memory access. only readl and writel make sure order with other memory read/write. 1. Write src_addr to descriptor 2. dma_wmb() 3. Write "ready" to descriptor 4. enable channel or doorbell by write a register. if 4 use writel_relaxe(). because 3 write to DDR, which difference place of mmio, 4 may happen before 3. Your can refer axi order model. 4 have to use ONE writel(), to make sure 3 already write to DDR. You need use at least one writel() to make sure all nornmal memory finish. +writel_relaxed(chan->swdesc->ccr, ddata->base + STM32_DMA3_CCR(id)); +writel_relaxed(hwdesc->ctr1, ddata->base + STM32_DMA3_CTR1(id)); +writel_relaxed(hwdesc->ctr2, ddata->base + STM32_DMA3_CTR2(id)); +writel_relaxed(hwdesc->cbr1, ddata->base + STM32_DMA3_CBR1(id)); +writel_relaxed(hwdesc->csar, ddata->base + STM32_DMA3_CSAR(id)); +writel_relaxed(hwdesc->cdar, ddata->base + STM32_DMA3_CDAR(id)); +writel_relaxed(hwdesc->cllr, ddata->base + STM32_DMA3_CLLR(id)); These writel_relaxed() are from descriptors to DMA3 registers (descriptors being prepared "a long time ago" during _prep_). As I said previously, DMA3 registers are outside CPU cluster, accessible through AHB bus, and ddata->base to address registers is ioremapped as Device memory type, non-cacheable and strongly ordered. arch/arm/include/asm/io.h: /* * ioremap() and friends. * * ioremap() takes a resource address, and size. Due to the ARM memory * types, it is important to use the correct ioremap() function as each * mapping has specific properties. * * Function Memory type CacheabilityCache hint * *ioremap()* *Device**n/a* *n/a* * ioremap_cache() Normal Writeback Read allocate * ioremap_wc() Normal Non-cacheable n/a * ioremap_wt() Normal Non-cacheable n/a * * All device mappings have the following properties: * - no access speculation * - no repetition (eg, on return from an exception) * - number, order and size of accesses are maintained * - unaligned accesses are "unpredictable" * - writes may be delayed before they hit the endpoint device On our platforms, we know that to ensure the writes have hit the endpoint device (aka DMA3 registers), a read have to be done before. And that's what is done before enabling the channel: +ccr = readl_relaxed(ddata->base + STM32_DMA3_CCR(id)); +writel_relaxed(ccr | CCR_EN, ddata->base + STM32_DMA3_CCR(id)); If there was an issue in this part of the code, it means channel would be started while it is wrongly programmed. In that case, DMA3 would raise a User Setting Error interrupt and disable the channel. User Setting Error is managed in this driver (USEF/stm32_dma3_check_user_setting()). And we never had reached a situation. + + chan->dma_status = DMA_IN_PROGRESS; + + dev_dbg(chan2dev(chan), "vchan %pK: started\n", &chan->vchan); +} + +static int stm32_dma3_chan_suspend(struct stm32_dma3_chan *chan, bool susp) +{ + struct stm32_dma3_ddata *ddata = to_stm32_dma3_ddata(chan); + u32 csr, ccr = readl_relaxed(ddata->base + STM32_DMA3_CCR(chan->id)) & ~CCR_EN; + int ret = 0; + + if (susp) + ccr |= CCR_SUSP; + else + ccr &= ~CCR_SUSP; +
[PATCH v2] wifi: mac80211: Avoid address calculations via out of bounds array indexing
req->n_channels must be set before req->channels[] can be used. This patch fixes one of the issues encountered in [1]. [ 83.964252] [ cut here ] [ 83.964255] UBSAN: array-index-out-of-bounds in net/mac80211/scan.c:364:4 [ 83.964258] index 0 is out of range for type 'struct ieee80211_channel *[]' [ 83.964260] CPU: 0 PID: 1695 Comm: iwd Tainted: G OT 6.8.9-gentoo-hardened1 #1 [ 83.964262] Hardware name: System76 Pangolin/Pangolin, BIOS ARB928_V00.01_T0025ASY1_ms 04/20/2023 [ 83.964264] Call Trace: [ 83.964267] [ 83.964269] dump_stack_lvl+0x3f/0xc0 [ 83.964274] __ubsan_handle_out_of_bounds+0xec/0x110 [ 83.964278] ieee80211_prep_hw_scan+0x2db/0x4b0 [ 83.964281] __ieee80211_start_scan+0x601/0x990 [ 83.964284] ? srso_alias_return_thunk+0x5/0xfbef5 [ 83.964287] ? cfg80211_scan+0x149/0x250 [ 83.964291] nl80211_trigger_scan+0x874/0x980 [ 83.964295] genl_family_rcv_msg_doit+0xe8/0x160 [ 83.964298] genl_rcv_msg+0x240/0x270 [ 83.964301] ? __cfi_nl80211_trigger_scan+0x10/0x10 [ 83.964302] ? __cfi_nl80211_post_doit+0x10/0x10 [ 83.964304] ? __cfi_nl80211_pre_doit+0x10/0x10 [ 83.964307] ? __cfi_genl_rcv_msg+0x10/0x10 [ 83.964309] netlink_rcv_skb+0x102/0x130 [ 83.964312] genl_rcv+0x23/0x40 [ 83.964314] netlink_unicast+0x23b/0x340 [ 83.964316] netlink_sendmsg+0x3a9/0x450 [ 83.964319] __sys_sendto+0x3ae/0x3c0 [ 83.964324] __x64_sys_sendto+0x21/0x40 [ 83.964326] do_syscall_64+0x90/0x150 [ 83.964329] ? srso_alias_return_thunk+0x5/0xfbef5 [ 83.964331] ? syscall_exit_work+0xc2/0xf0 [ 83.964333] ? srso_alias_return_thunk+0x5/0xfbef5 [ 83.964335] ? syscall_exit_to_user_mode+0x74/0xa0 [ 83.964337] ? srso_alias_return_thunk+0x5/0xfbef5 [ 83.964339] ? do_syscall_64+0x9c/0x150 [ 83.964340] ? srso_alias_return_thunk+0x5/0xfbef5 [ 83.964342] ? syscall_exit_to_user_mode+0x74/0xa0 [ 83.964344] ? srso_alias_return_thunk+0x5/0xfbef5 [ 83.964346] ? do_syscall_64+0x9c/0x150 [ 83.964347] ? srso_alias_return_thunk+0x5/0xfbef5 [ 83.964349] ? do_syscall_64+0x9c/0x150 [ 83.964351] ? srso_alias_return_thunk+0x5/0xfbef5 [ 83.964353] ? syscall_exit_work+0xc2/0xf0 [ 83.964354] ? srso_alias_return_thunk+0x5/0xfbef5 [ 83.964356] ? syscall_exit_to_user_mode+0x74/0xa0 [ 83.964358] ? srso_alias_return_thunk+0x5/0xfbef5 [ 83.964359] ? do_syscall_64+0x9c/0x150 [ 83.964361] ? srso_alias_return_thunk+0x5/0xfbef5 [ 83.964362] ? do_user_addr_fault+0x488/0x620 [ 83.964366] ? srso_alias_return_thunk+0x5/0xfbef5 [ 83.964367] ? srso_alias_return_thunk+0x5/0xfbef5 [ 83.964369] entry_SYSCALL_64_after_hwframe+0x55/0x5d [ 83.964372] RIP: 0033:0x6200808578d7 [ 83.964374] Code: 00 00 90 f3 0f 1e fa 41 56 55 41 89 ce 48 83 ec 28 80 3d 7b f7 0d 00 00 74 29 45 31 c9 45 31 c0 41 89 ca b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 71 48 83 c4 28 5d 41 5e c3 66 0f 1f 84 00 00 [ 83.964375] RSP: 002b:730c4e821530 EFLAGS: 0246 ORIG_RAX: 002c [ 83.964378] RAX: ffda RBX: 06dbc456c570 RCX: 6200808578d7 [ 83.964379] RDX: 005c RSI: 06dbc45884f0 RDI: 0004 [ 83.964381] RBP: 0004 R08: R09: [ 83.964382] R10: R11: 0246 R12: 06dbc456c480 [ 83.964383] R13: 06dbc456c450 R14: R15: 06dbc456c610 [ 83.964386] [ 83.964386] ---[ end trace ]--- [1] https://bugzilla.kernel.org/show_bug.cgi?id=218810 v1->v2: - Drop changes in cfg80211 as requested by Johannes Co-authored-by: Kees Cook Signed-off-by: Kenton Groombridge --- net/mac80211/scan.c | 17 + 1 file changed, 9 insertions(+), 8 deletions(-) diff --git a/net/mac80211/scan.c b/net/mac80211/scan.c index 73850312580f..b88e99c211ff 100644 --- a/net/mac80211/scan.c +++ b/net/mac80211/scan.c @@ -358,7 +358,8 @@ static bool ieee80211_prep_hw_scan(struct ieee80211_sub_if_data *sdata) struct cfg80211_scan_request *req; struct cfg80211_chan_def chandef; u8 bands_used = 0; - int i, ielen, n_chans; + int i, ielen; + u32 *n_chans; u32 flags = 0; req = rcu_dereference_protected(local->scan_req, @@ -368,34 +369,34 @@ static bool ieee80211_prep_hw_scan(struct ieee80211_sub_if_data *sdata) return false; if (ieee80211_hw_check(&local->hw, SINGLE_SCAN_ON_ALL_BANDS)) { + local->hw_scan_req->req.n_channels = req->n_channels; + for (i = 0; i < req->n_channels; i++) { local->hw_scan_req->req.channels[i] = req->channels[i]; bands_used |= BIT(req->channels[i]->band); } - - n_chans = req->n_channels; } else { do { if (local->hw_scan_band == NUM_NL80211_BANDS) return false; - n_chan
Re: [PATCH 05/12] dmaengine: Add STM32 DMA3 support
On Fri, May 17, 2024 at 11:42:17AM +0200, Amelie Delaunay wrote: > On 5/16/24 19:09, Frank Li wrote: > > On Thu, May 16, 2024 at 05:25:58PM +0200, Amelie Delaunay wrote: > > > On 5/15/24 20:56, Frank Li wrote: > > > > On Tue, Apr 23, 2024 at 02:32:55PM +0200, Amelie Delaunay wrote: > > > > > STM32 DMA3 driver supports the 3 hardware configurations of the STM32 > > > > > DMA3 > > > > > controller: > > ... > > > > > + writel_relaxed(hwdesc->cdar, ddata->base + STM32_DMA3_CDAR(id)); > > > > > + writel_relaxed(hwdesc->cllr, ddata->base + STM32_DMA3_CLLR(id)); > > > > > + > > > > > + /* Clear any pending interrupts */ > > > > > + csr = readl_relaxed(ddata->base + STM32_DMA3_CSR(id)); > > > > > + if (csr & CSR_ALL_F) > > > > > + writel_relaxed(csr, ddata->base + STM32_DMA3_CFCR(id)); > > > > > + > > > > > + stm32_dma3_chan_dump_reg(chan); > > > > > + > > > > > + ccr = readl_relaxed(ddata->base + STM32_DMA3_CCR(id)); > > > > > + writel_relaxed(ccr | CCR_EN, ddata->base + STM32_DMA3_CCR(id)); > > > > > > > > This one should use writel instead of writel_relaxed because it need > > > > dma_wmb() as barrier for preious write complete. > > > > > > > > Frank > > > > > > > > > > ddata->base is Device memory type thanks to ioremap() use, so it is > > > strongly > > > ordered and non-cacheable. > > > DMA3 is outside CPU cluster, its registers are accessible through AHB bus. > > > dma_wmb() (in case of writel instead of writel_relaxed) is useless in that > > > case: it won't ensure the propagation on the bus is complete, and it will > > > have impacts on the system. > > > That's why CCR register is written once, then it is read before CCR_EN is > > > set and being written again, with _relaxed(), because registers are > > > behind a > > > bus, and ioremapped with Device memory type which ensures it is strongly > > > ordered and non-cacheable. > > > > regardless memory map, writel_relaxed() just make sure io write and read is > > orderred, not necessary order with other memory access. only readl and > > writel make sure order with other memory read/write. > > > > 1. Write src_addr to descriptor > > 2. dma_wmb() > > 3. Write "ready" to descriptor > > 4. enable channel or doorbell by write a register. > > > > if 4 use writel_relaxe(). because 3 write to DDR, which difference place of > > mmio, 4 may happen before 3. Your can refer axi order model. > > > > 4 have to use ONE writel(), to make sure 3 already write to DDR. > > > > You need use at least one writel() to make sure all nornmal memory finish. > > > > +writel_relaxed(chan->swdesc->ccr, ddata->base + STM32_DMA3_CCR(id)); > +writel_relaxed(hwdesc->ctr1, ddata->base + STM32_DMA3_CTR1(id)); > +writel_relaxed(hwdesc->ctr2, ddata->base + STM32_DMA3_CTR2(id)); > +writel_relaxed(hwdesc->cbr1, ddata->base + STM32_DMA3_CBR1(id)); > +writel_relaxed(hwdesc->csar, ddata->base + STM32_DMA3_CSAR(id)); > +writel_relaxed(hwdesc->cdar, ddata->base + STM32_DMA3_CDAR(id)); > +writel_relaxed(hwdesc->cllr, ddata->base + STM32_DMA3_CLLR(id)); > > These writel_relaxed() are from descriptors to DMA3 registers (descriptors > being prepared "a long time ago" during _prep_). You can't depend on "a long time ago" during _prep_. If later your driver run at fast CPU. The execute time will be short. All dma_map_sg and dma_alloc_coherence ... need at least one writel() to make sure previous write actually reach to DDR. Some data may not really reach DDR, when DMA already start transfer Please ref linux kernel document: Documentation/memory-barriers.txt, line 1948. In your issue_pending(), call this function to enable channel. So need at least one writel(). > As I said previously, DMA3 registers are outside CPU cluster, accessible > through AHB bus, and ddata->base to address registers is ioremapped as > Device memory type, non-cacheable and strongly ordered. > > arch/arm/include/asm/io.h: > /* > * ioremap() and friends. > * > * ioremap() takes a resource address, and size. Due to the ARM memory > * types, it is important to use the correct ioremap() function as each > * mapping has specific properties. > * > * FunctionMemory type CacheabilityCache hint > * *ioremap()* *Device**n/a* *n/a* > * ioremap_cache() Normal Writeback Read allocate > * ioremap_wc()Normal Non-cacheable n/a > * ioremap_wt()Normal Non-cacheable n/a > * > * All device mappings have the following properties: > * - no access speculation > * - no repetition (eg, on return from an exception) > * - number, order and size of accesses are maintained > * - unaligned accesses are "unpredictable" > * - writes may be delayed before they hit the endpoint device > > On our platforms, we know that to ensure the writes have hit the endpoint > device (aka DMA3 registers), a read have to be done before. > And that's wha
[PATCH 2/2] wifi: mac80211: fix UBSAN noise in ieee80211_prep_hw_scan()
When testing the previous patch with CONFIG_UBSAN_BOUNDS, I've noticed the following: UBSAN: array-index-out-of-bounds in net/mac80211/scan.c:372:4 index 0 is out of range for type 'struct ieee80211_channel *[]' CPU: 0 PID: 1435 Comm: wpa_supplicant Not tainted 6.9.0+ #1 Hardware name: LENOVO 20UN005QRT/20UN005QRT <...BIOS details...> Call Trace: dump_stack_lvl+0x2d/0x90 __ubsan_handle_out_of_bounds+0xe7/0x140 ? timerqueue_add+0x98/0xb0 ieee80211_prep_hw_scan+0x2db/0x480 [mac80211] ? __kmalloc+0xe1/0x470 __ieee80211_start_scan+0x541/0x760 [mac80211] rdev_scan+0x1f/0xe0 [cfg80211] nl80211_trigger_scan+0x9b6/0xae0 [cfg80211] ... Since '__ieee80211_start_scan()' leaves 'hw_scan_req->req.n_channels' uninitialized, actual boundaries of 'hw_scan_req->req.channels' can't be checked in 'ieee80211_prep_hw_scan()'. Although an initialization of 'hw_scan_req->req.n_channels' introduces some confusion around allocated vs. used VLA members, this shouldn't be a problem since everything is correctly adjusted soon in 'ieee80211_prep_hw_scan()'. Cleanup 'kmalloc()' math in '__ieee80211_start_scan()' by using the convenient 'struct_size()' as well. Signed-off-by: Dmitry Antipov --- net/mac80211/scan.c | 11 --- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/net/mac80211/scan.c b/net/mac80211/scan.c index 3da1c5c45035..ab2a11a02673 100644 --- a/net/mac80211/scan.c +++ b/net/mac80211/scan.c @@ -745,14 +745,19 @@ static int __ieee80211_start_scan(struct ieee80211_sub_if_data *sdata, } local->hw_scan_req = kmalloc( - sizeof(*local->hw_scan_req) + - req->n_channels * sizeof(req->channels[0]) + - local->hw_scan_ies_bufsize, GFP_KERNEL); + (struct_size(local->hw_scan_req, +req.channels, req->n_channels) ++ local->hw_scan_ies_bufsize), GFP_KERNEL); if (!local->hw_scan_req) return -ENOMEM; local->hw_scan_req->req.ssids = req->ssids; local->hw_scan_req->req.n_ssids = req->n_ssids; + /* None of the channels are actually set +* up but let UBSAN know the boundaries. +*/ + local->hw_scan_req->req.n_channels = req->n_channels; + ies = (u8 *)local->hw_scan_req + sizeof(*local->hw_scan_req) + req->n_channels * sizeof(req->channels[0]); -- 2.45.1
[PATCH 1/2] wifi: cfg80211: use __counted_by where appropriate
Annotate 'sub_specs' of 'struct cfg80211_sar_specs', 'channels' of 'struct cfg80211_sched_scan_request', 'channels' of 'struct cfg80211_wowlan_nd_match', and 'matches' of 'struct cfg80211_wowlan_nd_info' with '__counted_by' attribute. Briefly tested with clang 18.1.1 and CONFIG_UBSAN_BOUNDS running iwlwifi. Signed-off-by: Dmitry Antipov --- include/net/cfg80211.h | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/include/net/cfg80211.h b/include/net/cfg80211.h index cbf1664dc569..d79180bec7a1 100644 --- a/include/net/cfg80211.h +++ b/include/net/cfg80211.h @@ -2200,7 +2200,7 @@ struct cfg80211_sar_sub_specs { struct cfg80211_sar_specs { enum nl80211_sar_type type; u32 num_sub_specs; - struct cfg80211_sar_sub_specs sub_specs[]; + struct cfg80211_sar_sub_specs sub_specs[] __counted_by(num_sub_specs); }; @@ -2838,7 +2838,7 @@ struct cfg80211_sched_scan_request { struct list_head list; /* keep last */ - struct ieee80211_channel *channels[]; + struct ieee80211_channel *channels[] __counted_by(n_channels); }; /** @@ -3582,7 +3582,7 @@ struct cfg80211_coalesce { struct cfg80211_wowlan_nd_match { struct cfg80211_ssid ssid; int n_channels; - u32 channels[]; + u32 channels[] __counted_by(n_channels); }; /** @@ -3596,7 +3596,7 @@ struct cfg80211_wowlan_nd_match { */ struct cfg80211_wowlan_nd_info { int n_matches; - struct cfg80211_wowlan_nd_match *matches[]; + struct cfg80211_wowlan_nd_match *matches[] __counted_by(n_matches); }; /** -- 2.45.1
[PATCH v3 0/2] tty: rfcomm: refactor rfcomm_get_dev_list() function
This is an effort to get rid of all multiplications from allocation functions in order to prevent integer overflows [1][2]. As the "dl" variable is a pointer to "struct rfcomm_dev_list_req" and this structure ends in a flexible array: struct rfcomm_dev_list_req { [...] struct rfcomm_dev_info dev_info[]; }; the preferred way in the kernel is to use the struct_size() helper to do the arithmetic instead of the calculation "size + count * size" in the kzalloc() and copy_to_user() functions. At the same time, prepare for the coming implementation by GCC and Clang of the __counted_by attribute. Flexible array members annotated with __counted_by can have their accesses bounds-checked at run-time via CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions). In this case, it is important to note that the logic needs a little refactoring to ensure that the "dev_num" member is initialized before the first access to the flex array. Specifically, add the assignment before the list_for_each_entry() loop. Also remove the "size" variable as it is no longer needed and refactor the list_for_each_entry() loop to use di[n] instead of (di + n). This way, the code is more readable, idiomatic and safer. This code was detected with the help of Coccinelle, and audited and modified manually. Specifically, the first patch is related to the struct_size() helper and the second patch refactors the list_for_each_entry() loop to use array indexing instead of pointer arithmetic. Regards, Erick Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#open-coded-arithmetic-in-allocator-arguments [1] Link: https://github.com/KSPP/linux/issues/160 [2] --- Changes in v3: - Add the "Reviewed-by:" tags. - Split the changes in two commits (Jiri Slaby, Luiz Augusto von Dentz). Changes in v2: - Add the __counted_by() attribute (Kees Cook). - Refactor the list_for_each_entry() loop to use di[n] instead of (di + n) (Kees Cook). Previous versions: v2 -> https://lore.kernel.org/linux-hardening/as8pr02mb7237262c62b054fabd7229168b...@as8pr02mb7237.eurprd02.prod.outlook.com/ v1 -> https://lore.kernel.org/linux-hardening/as8pr02mb723725e0069f7ae8f64e876e8b...@as8pr02mb7237.eurprd02.prod.outlook.com/ --- Erick Archer (2): tty: rfcomm: prefer struct_size over open coded arithmetic tty: rfcomm: prefer array indexing over pointer arithmetic include/net/bluetooth/rfcomm.h | 2 +- net/bluetooth/rfcomm/tty.c | 23 ++- 2 files changed, 11 insertions(+), 14 deletions(-) -- 2.25.1
[PATCH v3 1/2] tty: rfcomm: prefer struct_size over open coded arithmetic
This is an effort to get rid of all multiplications from allocation functions in order to prevent integer overflows [1][2]. As the "dl" variable is a pointer to "struct rfcomm_dev_list_req" and this structure ends in a flexible array: struct rfcomm_dev_list_req { [...] struct rfcomm_dev_info dev_info[]; }; the preferred way in the kernel is to use the struct_size() helper to do the arithmetic instead of the calculation "size + count * size" in the kzalloc() and copy_to_user() functions. At the same time, prepare for the coming implementation by GCC and Clang of the __counted_by attribute. Flexible array members annotated with __counted_by can have their accesses bounds-checked at run-time via CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions). In this case, it is important to note that the logic needs a little refactoring to ensure that the "dev_num" member is initialized before the first access to the flex array. Specifically, add the assignment before the list_for_each_entry() loop. Also remove the "size" variable as it is no longer needed. This way, the code is more readable and safer. This code was detected with the help of Coccinelle, and audited and modified manually. Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#open-coded-arithmetic-in-allocator-arguments [1] Link: https://github.com/KSPP/linux/issues/160 [2] Reviewed-by: Kees Cook Signed-off-by: Erick Archer --- include/net/bluetooth/rfcomm.h | 2 +- net/bluetooth/rfcomm/tty.c | 11 --- 2 files changed, 5 insertions(+), 8 deletions(-) diff --git a/include/net/bluetooth/rfcomm.h b/include/net/bluetooth/rfcomm.h index 99d26879b02a..c05882476900 100644 --- a/include/net/bluetooth/rfcomm.h +++ b/include/net/bluetooth/rfcomm.h @@ -355,7 +355,7 @@ struct rfcomm_dev_info { struct rfcomm_dev_list_req { u16 dev_num; - struct rfcomm_dev_info dev_info[]; + struct rfcomm_dev_info dev_info[] __counted_by(dev_num); }; int rfcomm_dev_ioctl(struct sock *sk, unsigned int cmd, void __user *arg); diff --git a/net/bluetooth/rfcomm/tty.c b/net/bluetooth/rfcomm/tty.c index 69c75c041fe1..44b781e7569e 100644 --- a/net/bluetooth/rfcomm/tty.c +++ b/net/bluetooth/rfcomm/tty.c @@ -504,7 +504,7 @@ static int rfcomm_get_dev_list(void __user *arg) struct rfcomm_dev *dev; struct rfcomm_dev_list_req *dl; struct rfcomm_dev_info *di; - int n = 0, size, err; + int n = 0, err; u16 dev_num; BT_DBG(""); @@ -515,12 +515,11 @@ static int rfcomm_get_dev_list(void __user *arg) if (!dev_num || dev_num > (PAGE_SIZE * 4) / sizeof(*di)) return -EINVAL; - size = sizeof(*dl) + dev_num * sizeof(*di); - - dl = kzalloc(size, GFP_KERNEL); + dl = kzalloc(struct_size(dl, dev_info, dev_num), GFP_KERNEL); if (!dl) return -ENOMEM; + dl->dev_num = dev_num; di = dl->dev_info; mutex_lock(&rfcomm_dev_lock); @@ -542,9 +541,7 @@ static int rfcomm_get_dev_list(void __user *arg) mutex_unlock(&rfcomm_dev_lock); dl->dev_num = n; - size = sizeof(*dl) + n * sizeof(*di); - - err = copy_to_user(arg, dl, size); + err = copy_to_user(arg, dl, struct_size(dl, dev_info, n)); kfree(dl); return err ? -EFAULT : 0; -- 2.25.1
[PATCH v3 2/2] tty: rfcomm: prefer array indexing over pointer arithmetic
Refactor the list_for_each_entry() loop of rfcomm_get_dev_list() function to use array indexing instead of pointer arithmetic. This way, the code is more readable and idiomatic. Reviewed-by: Kees Cook Signed-off-by: Erick Archer --- net/bluetooth/rfcomm/tty.c | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/net/bluetooth/rfcomm/tty.c b/net/bluetooth/rfcomm/tty.c index 44b781e7569e..af80d599c337 100644 --- a/net/bluetooth/rfcomm/tty.c +++ b/net/bluetooth/rfcomm/tty.c @@ -527,12 +527,12 @@ static int rfcomm_get_dev_list(void __user *arg) list_for_each_entry(dev, &rfcomm_dev_list, list) { if (!tty_port_get(&dev->port)) continue; - (di + n)->id = dev->id; - (di + n)->flags = dev->flags; - (di + n)->state = dev->dlc->state; - (di + n)->channel = dev->channel; - bacpy(&(di + n)->src, &dev->src); - bacpy(&(di + n)->dst, &dev->dst); + di[n].id = dev->id; + di[n].flags = dev->flags; + di[n].state = dev->dlc->state; + di[n].channel = dev->channel; + bacpy(&di[n].src, &dev->src); + bacpy(&di[n].dst, &dev->dst); tty_port_put(&dev->port); if (++n >= dev_num) break; -- 2.25.1
Re: [PATCH] arm64: smp: smp_send_stop() and crash_smp_send_stop() should try non-NMI first
Hi, On Fri, Apr 12, 2024 at 6:55 AM Will Deacon wrote: > > Hi Doug, > > I'm doing some inbox Spring cleaning! No worries. I got your reply while I was on a bunch of business travel and finally cleared stuff out enough to take a look again. ;-) > On Thu, Dec 07, 2023 at 05:02:56PM -0800, Douglas Anderson wrote: > > When testing hard lockup handling on my sc7180-trogdor-lazor device > > with pseudo-NMI enabled, with serial console enabled and with kgdb > > disabled, I found that the stack crawls printed to the serial console > > ended up as a jumbled mess. After rebooting, the pstore-based console > > looked fine though. Also, enabling kgdb to trap the panic made the > > console look fine and avoided the mess. > > > > After a bit of tracking down, I came to the conclusion that this was > > what was happening: > > 1. The panic path was stopping all other CPUs with > >panic_other_cpus_shutdown(). > > 2. At least one of those other CPUs was in the middle of printing to > >the serial console and holding the console port's lock, which is > >grabbed with "irqsave". ...but since we were stopping with an NMI > >we didn't care about the "irqsave" and interrupted anyway. > > 3. Since we stopped the CPU while it was holding the lock it would > >never release it. > > 4. All future calls to output to the console would end up failing to > >get the lock in qcom_geni_serial_console_write(). This isn't > >_totally_ unexpected at panic time but it's a code path that's not > >well tested, hard to get right, and apparently doesn't work > >terribly well on the Qualcomm geni serial driver. > > > > It would probably be a reasonable idea to try to make the Qualcomm > > geni serial driver work better, but also it's nice not to get into > > this situation in the first place. > > > > Taking a page from what x86 appears to do in native_stop_other_cpus(), > > let's do this: > > 1. First, we'll try to stop other CPUs with a normal IPI and wait a > >second. This gives them a chance to leave critical sections. > > 2. If CPUs fail to stop then we'll retry with an NMI, but give a much > >lower timeout since there's no good reason for a CPU not to react > >quickly to a NMI. > > > > This works well and avoids the corrupted console and (presumably) > > could help avoid other similar issues. > > > > In order to do this, we need to do a little re-organization of our > > IPIs since we don't have any more free IDs. We'll do what was > > suggested in previous conversations and combine "stop" and "crash > > stop". That frees up an IPI so now we can have a "stop" and "stop > > NMI". > > > > In order to do this we also need a slight change in the way we keep > > track of which CPUs still need to be stopped. We need to know > > specifically which CPUs haven't stopped yet when we fall back to NMI > > but in the "crash stop" case the "cpu_online_mask" isn't updated as > > CPUs go down. This is why that code path had an atomic of the number > > of CPUs left. We'll solve this by making the cpumask into a > > global. This has a potential memory implication--with NR_CPUs = 4096 > > this is 4096/8 = 512 bytes of globals. On the upside in that same case > > we take 512 bytes off the stack which could potentially have made the > > stop code less reliable. It can be noted that the NMI backtrace code > > (lib/nmi_backtrace.c) uses the same approach and that use also > > confirms that updating the mask is safe from NMI. > > Updating the global masks without any synchronisation feels broken though: > > > @@ -1085,77 +1080,75 @@ void smp_send_stop(void) > > { > > unsigned long timeout; > > > > - if (num_other_online_cpus()) { > > - cpumask_t mask; > > + /* > > + * If this cpu is the only one alive at this point in time, online or > > + * not, there are no stop messages to be sent around, so just back > > out. > > + */ > > + if (num_other_online_cpus() == 0) > > + goto skip_ipi; > > > > - cpumask_copy(&mask, cpu_online_mask); > > - cpumask_clear_cpu(smp_processor_id(), &mask); > > + cpumask_copy(to_cpumask(stop_mask), cpu_online_mask); > > + cpumask_clear_cpu(smp_processor_id(), to_cpumask(stop_mask)); > > I don't see what prevents multiple CPUs getting in here concurrently and > tripping over the masks. x86 seems to avoid that with an atomic > 'stopping_cpu' variable in native_stop_other_cpus(). Do we need something > similar? Good point. nmi_trigger_cpumask_backtrace(), which my code was based on, has a test_and_set() for this and that seems simpler than the atomic_try_cmpxchg() from the x86 code. If we run into that case, what do you think we should do? I guess x86 just does a "return", though it feels like at least a warning should be printed since we're not doing what the function asked us to do. When we return there will be other CPUs running. In theory, we could try to help the other processor along? I don't know how
Re: [PATCH] arm64: smp: smp_send_stop() and crash_smp_send_stop() should try non-NMI first
Hi, On Thu, Dec 7, 2023 at 5:03 PM Douglas Anderson wrote: > > When testing hard lockup handling on my sc7180-trogdor-lazor device > with pseudo-NMI enabled, with serial console enabled and with kgdb > disabled, I found that the stack crawls printed to the serial console > ended up as a jumbled mess. After rebooting, the pstore-based console > looked fine though. Also, enabling kgdb to trap the panic made the > console look fine and avoided the mess. > > After a bit of tracking down, I came to the conclusion that this was > what was happening: > 1. The panic path was stopping all other CPUs with >panic_other_cpus_shutdown(). > 2. At least one of those other CPUs was in the middle of printing to >the serial console and holding the console port's lock, which is >grabbed with "irqsave". ...but since we were stopping with an NMI >we didn't care about the "irqsave" and interrupted anyway. > 3. Since we stopped the CPU while it was holding the lock it would >never release it. > 4. All future calls to output to the console would end up failing to >get the lock in qcom_geni_serial_console_write(). This isn't >_totally_ unexpected at panic time but it's a code path that's not >well tested, hard to get right, and apparently doesn't work >terribly well on the Qualcomm geni serial driver. > > It would probably be a reasonable idea to try to make the Qualcomm > geni serial driver work better, but also it's nice not to get into > this situation in the first place. > > Taking a page from what x86 appears to do in native_stop_other_cpus(), > let's do this: > 1. First, we'll try to stop other CPUs with a normal IPI and wait a >second. This gives them a chance to leave critical sections. > 2. If CPUs fail to stop then we'll retry with an NMI, but give a much >lower timeout since there's no good reason for a CPU not to react >quickly to a NMI. > > This works well and avoids the corrupted console and (presumably) > could help avoid other similar issues. > > In order to do this, we need to do a little re-organization of our > IPIs since we don't have any more free IDs. We'll do what was > suggested in previous conversations and combine "stop" and "crash > stop". That frees up an IPI so now we can have a "stop" and "stop > NMI". > > In order to do this we also need a slight change in the way we keep > track of which CPUs still need to be stopped. We need to know > specifically which CPUs haven't stopped yet when we fall back to NMI > but in the "crash stop" case the "cpu_online_mask" isn't updated as > CPUs go down. This is why that code path had an atomic of the number > of CPUs left. We'll solve this by making the cpumask into a > global. This has a potential memory implication--with NR_CPUs = 4096 > this is 4096/8 = 512 bytes of globals. On the upside in that same case > we take 512 bytes off the stack which could potentially have made the > stop code less reliable. It can be noted that the NMI backtrace code > (lib/nmi_backtrace.c) uses the same approach and that use also > confirms that updating the mask is safe from NMI. > > All of the above lets us combine the logic for "stop" and "crash stop" > code, which appeared to have a bunch of arbitrary implementation > differences. Possibly this could make up for some of the 512 wasted > bytes. ;-) > > Aside from the above change where we try a normal IPI and then an NMI, > the combined function has a few subtle differences: > * In the normal smp_send_stop(), if we fail to stop one or more CPUs > then we won't include the current CPU (the one running > smp_send_stop()) in the error message. > * In crash_smp_send_stop(), if we fail to stop some CPUs we'll print > the CPUs that we failed to stop instead of printing all _but_ the > current running CPU. > * In crash_smp_send_stop(), we will now only print "SMP: stopping > secondary CPUs" if (system_state <= SYSTEM_RUNNING). > > Fixes: d7402513c935 ("arm64: smp: IPI_CPU_STOP and IPI_CPU_CRASH_STOP should > try for NMI") > Signed-off-by: Douglas Anderson > --- > I'm not setup to test the crash_smp_send_stop(). I made sure it > compiled and hacked the panic() method to call it, but I haven't > actually run kexec. Hopefully others can confirm that it's working for > them. > > arch/arm64/kernel/smp.c | 115 +++- > 1 file changed, 54 insertions(+), 61 deletions(-) > > diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c > index defbab84e9e5..9fe9d4342517 100644 > --- a/arch/arm64/kernel/smp.c > +++ b/arch/arm64/kernel/smp.c > @@ -71,7 +71,7 @@ enum ipi_msg_type { > IPI_RESCHEDULE, > IPI_CALL_FUNC, > IPI_CPU_STOP, > - IPI_CPU_CRASH_STOP, > + IPI_CPU_STOP_NMI, > IPI_TIMER, > IPI_IRQ_WORK, > NR_IPI, > @@ -88,6 +88,9 @@ static int ipi_irq_base __ro_after_init; > static int nr_ipi __ro_after_init = NR_IPI; > static struct irq_desc *ipi_desc[MAX_IPI] __ro_after_init; > > +stat
[PATCH v2] ntp: remove accidental integer wrap-around
Using syzkaller alongside the newly reintroduced signed integer overflow sanitizer spits out this report: UBSAN: signed-integer-overflow in ../kernel/time/ntp.c:461:16 9223372036854775807 + 500 cannot be represented in type 'long' Call Trace: dump_stack_lvl+0x93/0xd0 handle_overflow+0x171/0x1b0 second_overflow+0x2d6/0x500 accumulate_nsecs_to_secs+0x60/0x160 timekeeping_advance+0x1fe/0x890 update_wall_time+0x10/0x30 ... time_maxerror is unconditionally incremented and the result is checked against NTP_PHASE_LIMIT, but the increment itself can overflow, resulting in wrap-around to negative space. The user can supply some crazy values which is causing the overflow. Add an extra validation step checking that maxerror is reasonable. Link: https://github.com/llvm/llvm-project/pull/82432 [1] Closes: https://github.com/KSPP/linux/issues/354 Cc: linux-hardening@vger.kernel.org Signed-off-by: Justin Stitt --- Changes in v2: - update commit log (thanks Thomas) - check for sane user input during validation (thanks Thomas) - Link to v1: https://lore.kernel.org/r/20240507-b4-sio-ntp-usec-v1-1-15003fc9c...@google.com --- Historically, the signed integer overflow sanitizer did not work in the kernel due to its interaction with `-fwrapv` but this has since been changed [1] in the newest version of Clang. It was re-enabled in the kernel with Commit 557f8c582a9ba8ab ("ubsan: Reintroduce signed overflow sanitizer"). Here's the syzkaller reproducer: | #{Threaded:false Repeat:false RepeatTimes:0 Procs:1 Slowdown:1 Sandbox: | #SandboxArg:0 Leak:false NetInjection:false NetDevices:false | #NetReset:false Cgroups:false BinfmtMisc:false CloseFDs:false KCSAN:false | #DevlinkPCI:false NicVF:false USB:false VhciInjection:false Wifi:false | #IEEE802154:false Sysctl:false Swap:false UseTmpDir:false | #HandleSegv:false Repro:false Trace:false LegacyOptions:{Collide:false | #Fault:false FaultCall:0 FaultNth:0}} | clock_adjtime(0x0, &(0x7f00)={0x5, 0x1, 0x40, | 0x7fff, 0x8, 0xb2, 0x256, 0x6, 0x5, 0x8001, 0x9, 0x3f, 0x0, | 0x8000, 0x800, 0x64d, 0x5, 0x7ff, 0x8001, 0x1f, 0x3, | 0xfff, 0x7fff, 0x5, 0x100, 0x4}) ... which was used against Kees' tree here (v6.8rc2): https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git/log/?h=wip/v6.9-rc2/unsigned-overflow-sanitizer ... with this config: https://gist.github.com/JustinStitt/824976568b0f228ccbcbe49f3dee9bf4 --- kernel/time/timekeeping.c | 5 + 1 file changed, 5 insertions(+) diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c index b58dffc58a8f..321f251c02aa 100644 --- a/kernel/time/timekeeping.c +++ b/kernel/time/timekeeping.c @@ -2388,6 +2388,11 @@ static int timekeeping_validate_timex(const struct __kernel_timex *txc) } } + if (txc->modes & ADJ_MAXERROR) { + if (txc->maxerror < 0 || txc->maxerror > NTP_PHASE_LIMIT) + return -EINVAL; + } + /* * Check for potential multiplication overflows that can * only happen on 64-bit systems: --- base-commit: 0106679839f7c69632b3b9833c3268c316c0a9fc change-id: 20240507-b4-sio-ntp-usec-1a3ab67bdce1 Best regards, -- Justin Stitt
Re: [PATCH v3] fs: fix unintentional arithmetic wraparound in offset calculation
Hi, On Thu, May 16, 2024 at 6:13 PM Matthew Wilcox wrote: > > On Fri, May 17, 2024 at 12:29:06AM +, Justin Stitt wrote: > > When running syzkaller with the newly reintroduced signed integer > > overflow sanitizer we encounter this report: > > why do you keep saying it's unintentional? it's clearly intended. Right, "unintentional" is a poor choice of phrasing. I actually mean: "overflow-checking arithmetic was done in a way that intrinsically causes an overflow (wraparound)". I can clearly see the intent of the code; there's even comments saying exactly what it does: "/* Ensure offsets don't wrap. */"... So the thinking is: let's use the overflow-checking helpers so we can get a good signal through the sanitizers on _real_ bugs, especially in spots with no bounds handling. Thanks Justin
Re: [PATCH v2] wifi: mac80211: Avoid address calculations via out of bounds array indexing
On Fri, May 17, 2024 at 10:54:20AM -0400, Kenton Groombridge wrote: > req->n_channels must be set before req->channels[] can be used. > > This patch fixes one of the issues encountered in [1]. > > [ 83.964252] [ cut here ] > [ 83.964255] UBSAN: array-index-out-of-bounds in net/mac80211/scan.c:364:4 > [ 83.964258] index 0 is out of range for type 'struct ieee80211_channel *[]' > [ 83.964260] CPU: 0 PID: 1695 Comm: iwd Tainted: G OT > 6.8.9-gentoo-hardened1 #1 > [ 83.964262] Hardware name: System76 Pangolin/Pangolin, BIOS > ARB928_V00.01_T0025ASY1_ms 04/20/2023 > [ 83.964264] Call Trace: > [ 83.964267] > [ 83.964269] dump_stack_lvl+0x3f/0xc0 > [ 83.964274] __ubsan_handle_out_of_bounds+0xec/0x110 > [ 83.964278] ieee80211_prep_hw_scan+0x2db/0x4b0 > [ 83.964281] __ieee80211_start_scan+0x601/0x990 > [ 83.964284] ? srso_alias_return_thunk+0x5/0xfbef5 > [ 83.964287] ? cfg80211_scan+0x149/0x250 > [ 83.964291] nl80211_trigger_scan+0x874/0x980 > [ 83.964295] genl_family_rcv_msg_doit+0xe8/0x160 > [ 83.964298] genl_rcv_msg+0x240/0x270 > [ 83.964301] ? __cfi_nl80211_trigger_scan+0x10/0x10 > [ 83.964302] ? __cfi_nl80211_post_doit+0x10/0x10 > [ 83.964304] ? __cfi_nl80211_pre_doit+0x10/0x10 > [ 83.964307] ? __cfi_genl_rcv_msg+0x10/0x10 > [ 83.964309] netlink_rcv_skb+0x102/0x130 > [ 83.964312] genl_rcv+0x23/0x40 > [ 83.964314] netlink_unicast+0x23b/0x340 > [ 83.964316] netlink_sendmsg+0x3a9/0x450 > [ 83.964319] __sys_sendto+0x3ae/0x3c0 > [ 83.964324] __x64_sys_sendto+0x21/0x40 > [ 83.964326] do_syscall_64+0x90/0x150 > [ 83.964329] ? srso_alias_return_thunk+0x5/0xfbef5 > [ 83.964331] ? syscall_exit_work+0xc2/0xf0 > [ 83.964333] ? srso_alias_return_thunk+0x5/0xfbef5 > [ 83.964335] ? syscall_exit_to_user_mode+0x74/0xa0 > [ 83.964337] ? srso_alias_return_thunk+0x5/0xfbef5 > [ 83.964339] ? do_syscall_64+0x9c/0x150 > [ 83.964340] ? srso_alias_return_thunk+0x5/0xfbef5 > [ 83.964342] ? syscall_exit_to_user_mode+0x74/0xa0 > [ 83.964344] ? srso_alias_return_thunk+0x5/0xfbef5 > [ 83.964346] ? do_syscall_64+0x9c/0x150 > [ 83.964347] ? srso_alias_return_thunk+0x5/0xfbef5 > [ 83.964349] ? do_syscall_64+0x9c/0x150 > [ 83.964351] ? srso_alias_return_thunk+0x5/0xfbef5 u> [ 83.964353] ? syscall_exit_work+0xc2/0xf0 > [ 83.964354] ? srso_alias_return_thunk+0x5/0xfbef5 > [ 83.964356] ? syscall_exit_to_user_mode+0x74/0xa0 > [ 83.964358] ? srso_alias_return_thunk+0x5/0xfbef5 > [ 83.964359] ? do_syscall_64+0x9c/0x150 > [ 83.964361] ? srso_alias_return_thunk+0x5/0xfbef5 > [ 83.964362] ? do_user_addr_fault+0x488/0x620 > [ 83.964366] ? srso_alias_return_thunk+0x5/0xfbef5 > [ 83.964367] ? srso_alias_return_thunk+0x5/0xfbef5 > [ 83.964369] entry_SYSCALL_64_after_hwframe+0x55/0x5d > [ 83.964372] RIP: 0033:0x6200808578d7 > [ 83.964374] Code: 00 00 90 f3 0f 1e fa 41 56 55 41 89 ce 48 83 ec 28 80 3d > 7b f7 0d 00 00 74 29 45 31 c9 45 31 c0 41 89 ca b8 2c 00 00 00 0f 05 <48> 3d > 00 f0 ff ff 77 71 48 83 c4 28 5d 41 5e c3 66 0f 1f 84 00 00 > [ 83.964375] RSP: 002b:730c4e821530 EFLAGS: 0246 ORIG_RAX: > 002c > [ 83.964378] RAX: ffda RBX: 06dbc456c570 RCX: > 6200808578d7 > [ 83.964379] RDX: 005c RSI: 06dbc45884f0 RDI: > 0004 > [ 83.964381] RBP: 0004 R08: R09: > > [ 83.964382] R10: R11: 0246 R12: > 06dbc456c480 > [ 83.964383] R13: 06dbc456c450 R14: R15: > 06dbc456c610 > [ 83.964386] > [ 83.964386] ---[ end trace ]--- > > [1] https://bugzilla.kernel.org/show_bug.cgi?id=218810 > > v1->v2: > - Drop changes in cfg80211 as requested by Johannes > > Co-authored-by: Kees Cook > Signed-off-by: Kenton Groombridge Thanks Kenton, FWWIW, it seems unfortunate to me that the __counted_by field (n_channels) is set some distance away from the allocation of the flex-array (channels) whose bounds it checks. It seems it would be pretty easy for a bug in the code being updated here to result in an overrun. But in any case, I think this is an improvement and seems correct to me. Reviewed-by: Simon Horman
Re: [RFC] Mitigating unexpected arithmetic overflow
On Thu, May 16, 2024 at 02:51:34PM -0600, Theodore Ts'o wrote: > On Thu, May 16, 2024 at 12:48:47PM -0700, Justin Stitt wrote: > > > > It is incredibly important that the exact opposite approach is taken; > > we need to be annotating (or adding type qualifiers to) the _expected_ > > overflow cases. The omniscience required to go and properly annotate > > all the spots that will cause problems would suggest that we should > > just fix the bug outright. If only it was that easy. > > It certainly isn't easy, yes. But the problem is when you dump a huge > amount of work and pain onto kernel developers, when they haven't > signed up for it, when they don't necessarily have the time to do all > of the work themselves, and when their corporate overlords won't given > them the headcount to handle unfunded mandates which folks who are > looking for a bright new wonderful future --- don't be surprised if > kernel developers push back hard. I never claimed this to be some kind of "everyone has to stop and make these changes" event. I even talked about how it would be gradual and not break existing code (all "WARN and continue anyway"). This is what we've been doing with the array bounds work. Lots of people are helping with that, but a lot of those patches have been from Gustavo and me; we tried to keep the burden away from other developers as much as we could. > One of the big problems that we've seen with many of these security > initiatives is that the teams create these unfunded mandates get their > performance reviews based on how many "bug reports" that they file, > regardless of whether they are real problems or not. This has been a > problem with syzkaller, and with clusterfuzz. Let's not make this > problem worse with new and fancy sanitizers, please. Are you talking about *my* security initiatives? I've been doing this kind work in the kernel for 10 years, and at no time has "bug report count" been a metric. In fact, the whole goal is making it _impossible_ to have a bug. (e.g. VLA removal, switch fallthrough, etc). My drive has been to kill entire classes of bugs. The use of sanitizers isn't to just bolster fuzzers (though they're helpful for finding false positives). It's to use the sanitizers _in production_, to stop flaws from being exploitable. Android has enabled the array bounds sanitizer in trap mode for at least 2 years now. We want the kernel to be self-protective; pro-actively catching flaws. > Unfortunately, I don't get funding from my employer to clear these > kinds of reports, so when I do the work, it happens on the weekends or > late at night, on my own time, which is probably why I am so grumpy As for the work itself, like I mentioned before, most of these get fixed my Gustavo, me, and now Justin too. And many other folks step up to help out, which is great. Some get complex and other maintainers get involved too, but it's slow and steady. We're trying to reduce the frequency of the "fires" people have to scramble to deal with. The "not getting paid by Google to [fix syzkaller bugs]" part, I'm surprised by. A big part of my Google job role is the upstream work I do not only on security hardening but also on seccomp, pstore, execve/binfmt, strings, etc. I'll reach out offline to find out more details. > about this. Whether you call this "sharpening our focus", or "year of > efficiency", or pick your corporate buzzwords, it really doesn't > matter. The important thing is that the figure of merit must NOT be > "how many security bugs that are found", but how much bullsh*t noise > do these security features create, and how do you decrease overhead by > upstream developers to deal with the fuzzing/ubsan/security tools > find. I guess I *do* worry about bug counts, but only in that I want them to be _zero_. I know other folks aren't as adamant about eliminating bug classes, but it's really not hyperbole that bugs in Linux kill people. If you think I'm engaging in corporate buzzword shenanigans, then I have a lot more work to do on explaining the rationale behind the security hardening efforts. All this said, yes, I hear what you (and Linus and others) have been saying about minimizing the burden on other developers. I have tried my best to keep it limited, but some things are more front-and-center (like unexpected integer overflows), so that's why I wanted to get feedback on how to roll it out -- I didn't see a way to make these changes in a way that would be as unintrusive(?) as our prior efforts. It has felt like the biggest pain point has been helping developers shift their perspective about C: it has been more than a fancy assembler for several decades, and we can lean on those features (and create new ones) that shift the burden of correctness to the compiler from the human. This does mean we need to change some styles to be more "observable" (and unambiguous) for the compiler, though. I think a great example recently was Peter's work creating "cleanup.h". This feature tot
Re: [PATCH v3] fs: fix unintentional arithmetic wraparound in offset calculation
On Fri, May 17, 2024 at 02:26:47AM +0100, Al Viro wrote: > On Fri, May 17, 2024 at 02:13:22AM +0100, Matthew Wilcox wrote: > > On Fri, May 17, 2024 at 12:29:06AM +, Justin Stitt wrote: > > > When running syzkaller with the newly reintroduced signed integer > > > overflow sanitizer we encounter this report: > > > > why do you keep saying it's unintentional? it's clearly intended. > > Because they are short on actual bugs to be found by their tooling > and attempt to inflate the sound/noise rate; therefore, every time "short on bugs"? We're trying to drive it to zero. I would *love* to be short on bugs. See my reply[1] to Ted. > when overflow _IS_ handled correctly, it must have been an accident - > we couldn't have possibly done the analysis correctly. And if somebody > insists that they _are_ capable of basic math, they must be dishonest. > So... "unintentional" it's going to be. As Justin said, this is a poor choice in wording. In other cases I've tried to describe this as making changes so that intent is unambiguous (to both a human and a compiler). > Math is hard, mmkay? > > Al, more than slightly annoyed by that aspect of the entire thing... I'm sorry about that. None of this is a commentary on code correctness; we're just trying to refactor things so that the compiler can help us catch the _unintended_ overflows. This one is _intended_, so here we are to find a palatable way to leave the behavior unchanged while gaining compiler coverage. -Kees [1] https://lore.kernel.org/linux-hardening/202405171329.019F2F566C@keescook/ -- Kees Cook
Re: [RFC] Mitigating unexpected arithmetic overflow
On Thu, May 16, 2024 at 12:49 PM Justin Stitt wrote: > > Hi, > > On Thu, May 16, 2024 at 7:09 AM Peter Zijlstra wrote: > > > > On Thu, May 16, 2024 at 06:30:32AM -0700, Kees Cook wrote: > > > > > > I am a broken record. :) This is _not_ about undefined behavior. > > > > And yet you introduced CONFIG_UBSAN_SIGNED_WRAP... *UB*san, get it? > > We should think of UBSAN as an "Unexpected Behavior" Sanitizer. Clang > has evolved to provide instrumentation for things that are not > *technically* undefined behavior. > > Go to [1] and grep for some phrases like "not undefined behavior" and > see that we have quite a few sanitizers that aren't *technically* > handling undefined behavior. > > > > > > This is about finding a way to make the intent of C authors > > > unambiguous. That overflow wraps is well defined. It is not always > > > _desired_. C has no way to distinguish between the two cases. > > > > The current semantics are (and have been for years, decades at this > > point) that everything wraps nicely and code has been assuming this. You > > cannot just change this. > > Why not :>) > > Lots and lots of exploits are caused by unintentional arithmetic overflow [2]. > > > > > So what you do is do a proper language extension and add a type > > qualifier that makes overflows trap and annotate all them cases where > > people do not expect overflows (so that we can put the > > __builtin_*_overflow() things where the sun don't shine). > > It is incredibly important that the exact opposite approach is taken; > we need to be annotating (or adding type qualifiers to) the _expected_ > overflow cases. The omniscience required to go and properly annotate > all the spots that will cause problems would suggest that we should > just fix the bug outright. If only it was that easy. > > I don't think we're capable of identifying every single problematic > overflow/wraparound case in the kernel, this is pretty obvious > considering we've had decades to do so. Instead, it seems much more > feasible that we annotate (very, very minimally so as not to disrupt > code readability and style) the spots where we _know_ overflow should > happen. > > [1]: https://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html#ubsan-checks > [2]: https://cwe.mitre.org/data/definitions/190.html > > Thanks > Justin FWIW I have made -fsanitize=undefined -fwrapv not imply -fsanitize=signed-integer-overflow in https://github.com/llvm/llvm-project/pull/85501 . The change is not included in the LLVM 18.1.* releases. This might encourage projects using both -fwrapv and -fsanitize=undefined to re-evaluate whether -fwrapv aligns with their needs. The description of #85501 contains more information about my understanding of kernel security folks' mind: > Linux kernel uses -fwrapv to change signed integer overflows from undefined > behaviors to defined behaviors. However, the security folks still want > -fsanitize=signed-integer-overflow diagnostics. Their intention can be > expressed with -fwrapv -fsanitize=signed-integer-overflow (#80089). This mode > by default reports recoverable errors while still making signed integer > overflows defined (most UBSan checks are recoverable by default: you get > errors in stderr, but the program is not halted). -- 宋方睿
Re: [PATCH v2] ntp: safeguard against time_constant overflow case
Justin! On Fri, May 17 2024 at 00:47, Justin Stitt wrote: > if (txc->modes & ADJ_TIMECONST) { > - time_constant = txc->constant; > - if (!(time_status & STA_NANO)) > + if (!(time_status & STA_NANO) && time_constant < MAXTC) > time_constant += 4; > time_constant = min(time_constant, (long)MAXTC); > time_constant = max(time_constant, 0l); Let me digest this. The original code does: time_constant = txc->constant; if (!(time_status & STA_NANO)) time_constant += 4; time_constant = min(time_constant, (long)MAXTC); time_constant = max(time_constant, 0l); Your change results in: if (!(time_status & STA_NANO) && time_constant < MAXTC) time_constant += 4; time_constant = min(time_constant, (long)MAXTC); time_constant = max(time_constant, 0l); IOW, you lost the intent of the code to assign the user space supplied value of txc->constant. Aside of that you clearly failed to map the deep analysis I provided to you vs. the time_maxerror issue to this one: # git grep 'time_constant.*=' kernel/time/ ntp.c:66:static longtime_constant = 2; That's the static initializer kernel/time/ntp.c:736: time_constant = txc->constant; kernel/time/ntp.c:738: time_constant += 4; kernel/time/ntp.c:739: time_constant = min(time_constant, (long)MAXTC); kernel/time/ntp.c:740: time_constant = max(time_constant, 0l); That's the part of process_adjtimex_modes() you are trying to "fix". So it's exactly the same problem as with time_maxerror, no? And therefore you provide a "safeguard" against overflow for the price of making the syscall disfunctional. Seriously? Did you even try to run something else than the bad case reproducer against your fix? No. You did not. Any of the related real use case tests would have failed. I told you yesterday: Tools are good to pin-point symptoms, but they are by definition patently bad in root cause analysis. Otherwise we could just let the tool write the "fix". Such a tool would have at least produced a correct "fix" to cure the symptom. Thanks, tglx
Re: [RFC] Mitigating unexpected arithmetic overflow
On Fri, May 17, 2024 at 02:15:01PM -0700, Kees Cook wrote: > On Thu, May 16, 2024 at 02:51:34PM -0600, Theodore Ts'o wrote: > > On Thu, May 16, 2024 at 12:48:47PM -0700, Justin Stitt wrote: > > > > > > It is incredibly important that the exact opposite approach is taken; > > > we need to be annotating (or adding type qualifiers to) the _expected_ > > > overflow cases. The omniscience required to go and properly annotate > > > all the spots that will cause problems would suggest that we should > > > just fix the bug outright. If only it was that easy. > > > > It certainly isn't easy, yes. But the problem is when you dump a huge > > amount of work and pain onto kernel developers, when they haven't > > signed up for it, when they don't necessarily have the time to do all > > of the work themselves, and when their corporate overlords won't given > > them the headcount to handle unfunded mandates which folks who are > > looking for a bright new wonderful future --- don't be surprised if > > kernel developers push back hard. > > I never claimed this to be some kind of "everyone has to stop and make > these changes" event. I even talked about how it would be gradual and > not break existing code (all "WARN and continue anyway"). This is what > we've been doing with the array bounds work. Lots of people are helping > with that, but a lot of those patches have been from Gustavo and me; we > tried to keep the burden away from other developers as much as we could. Kees, sorry, I wasn't intending to be criticism of your work; I know you've been careful. What I was reacting was Justin's comment that it's important to annotate every single expected overflow case. This sounded like the very common attitude of "type II errors must be avoided at all costs", even in that means a huge number of "type I errors". And false positive errors invariably end up requiring work on the part of over-worked maintainers. >From my perspective, it's OK if we don't find all possible security bugs if we can keep the false negative rate down to near-zero. Because if it's too high, I will just start ignoring all of the "security reports", just out of self-defense. This is now the case with clusterfuzz reports example. The attemp[t to shame/pressure me with "fix this RIGHT NOW or we will make the report public in 30 days" no longer works on me, because the vast majority of the reports against e2fsprogs are bullshit. And there are many syzkaller reports which are also bullshit, and when they are reported against our data center kernels, they get down-prioritized to P3 and ignored, because they can't happen in real life. But they still get reported to upstream (me). This is why I am worried when there are proposals to add new sanitizers; even if they are used responsibly by you, I can easily see them resulting in more clusterfuzz reports (for example) that I will have to ignore as bullshit. So please considerr this a plea to **really** seriously reduce the false positive rate of sanitizers whenever possible. Cheers, - Ted
Re: [RFC] Mitigating unexpected arithmetic overflow
On Wed, May 08, 2024 at 11:11:35PM -0700, Kees Cook wrote: > Or even just simple math between u8s: > > while (xas->xa_offset == 255) { > xas->xa_offset = xas->xa_node->offset - 1; > > Is it expecting to wrap (truncate)? How is it distinguished from > the "num_elems++" case? Hi ;-) This looks to me like it's walking up the tree, looking to go to the 'prev' element. So yes, the wrapping from 0 to 255 is intentional, because once we find an offset that's not 0, we've set offset to the right one. Just to give more context, here's the few lines around it: if (xas->xa_offset != get_offset(xas->xa_index, xas->xa_node)) xas->xa_offset--; while (xas->xa_offset == 255) { xas->xa_offset = xas->xa_node->offset - 1; xas->xa_node = xa_parent(xas->xa, xas->xa_node); if (!xas->xa_node) return set_bounds(xas); } So it's kind of nasty. I don't want to write it as: while (xas->xa_offset == 0) { xas->xa_offset = xas->xa_node->offset; xas->xa_node = xa_parent(xas->xa, xas->xa_node); } xas->xa_offset--; because there's that conditional subtraction before the loop starts. The xarray test-suite is pretty good if anyone thinks they have a clearer way to write this loop ...