On 2020/11/3 3:41, Jakub Kicinski wrote: > On Mon, 2 Nov 2020 11:14:32 +0800 Yunsheng Lin wrote: >> On 2020/11/1 6:38, Jakub Kicinski wrote: >>> On Thu, 29 Oct 2020 19:34:48 +0800 Yunsheng Lin wrote: >>>> The current semantic for napi_consume_skb() is that caller need >>>> to provide non-zero budget when calling from NAPI context, and >>>> breaking this semantic will cause hard to debug problem, because >>>> _kfree_skb_defer() need to run in atomic context in order to push >>>> the skb to the particular cpu' napi_alloc_cache atomically. >>>> >>>> So add a in_softirq() debug checking in napi_consume_skb() to catch >>>> this kind of error. >>>> >>>> Suggested-by: Eric Dumazet <eduma...@google.com> >>>> Signed-off-by: Yunsheng Lin <linyunsh...@huawei.com> >>> >>>> diff --git a/net/core/skbuff.c b/net/core/skbuff.c >>>> index 1ba8f01..1834007 100644 >>>> --- a/net/core/skbuff.c >>>> +++ b/net/core/skbuff.c >>>> @@ -897,6 +897,10 @@ void napi_consume_skb(struct sk_buff *skb, int budget) >>>> return; >>>> } >>>> >>>> + DEBUG_NET_WARN(!in_softirq(), >>>> + "%s is called with non-zero budget outside softirq >>>> context.\n", >>>> + __func__); >>> >>> Can't we use lockdep instead of defining our own knobs? >> >> From the first look, using the below seems better than defining our >> own knobs, because there is similar lockdep_assert_in_irq() checking >> already and lockdep_assert_in_*() is NULL-OP when CONFIG_PROVE_LOCKING >> is not defined. >> >>> >>> Like this maybe? >>> >>> diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h >>> index f5594879175a..5253a167d00c 100644 >>> --- a/include/linux/lockdep.h >>> +++ b/include/linux/lockdep.h >>> @@ -594,6 +594,14 @@ do { >>> \ >>> this_cpu_read(hardirqs_enabled))); \ >>> } while (0) >>> >>> +#define lockdep_assert_in_softirq() \ >>> +do { \ >>> + WARN_ON_ONCE(__lockdep_enabled && \ >>> + (softirq_count() == 0 || \ >>> + this_cpu_read(hardirq_context))); \ >> >> Using in_softirq() seems more obvious then using softirq_count()? >> And there is below comment above avoiding the using of in_softirq(), maybe >> that is why you use softirq_count() directly here? >> "softirq_count() == 0" still mean we are not in the SoftIRQ context and >> BH is not disabled. right? Perhap lockdep_assert_in_softirq_or_bh_disabled() >> is more obvious? > > Let's add Peter to the recipients to get his opinion. > > We have a per-cpu resource which is also accessed from BH (see > _kfree_skb_defer()). > > We're trying to come up with the correct lockdep incantation for it.
Hi, Peter Any suggestion? > >> /* >> * Are we doing bottom half or hardware interrupt processing? >> * >> * in_irq() - We're in (hard) IRQ context >> * in_softirq() - We have BH disabled, or are processing softirqs >> * in_interrupt() - We're in NMI,IRQ,SoftIRQ context or have BH disabled >> * in_serving_softirq() - We're in softirq context >> * in_nmi() - We're in NMI context >> * in_task() - We're in task context >> * >> * Note: due to the BH disabled confusion: in_softirq(),in_interrupt() really >> * should not be used in new code. >> */ >> >> >> Also, is there any particular reason we do the >> "this_cpu_read(hardirq_context)" >> checking? > > Accessing BH resources from a hard IRQ context would be a bug, we may > have interrupted a BH, so AFAIU softirq_count() != 0, but we can nest > calls to _kfree_skb_defer(). In that case, maybe just call lockdep_assert_in_irq() is enough? > . >