On Sun, Sep 23, 2018 at 12:29 PM David Miller <da...@davemloft.net> wrote:
>
> From: Eric Dumazet <eduma...@google.com>
> Date: Fri, 21 Sep 2018 15:27:37 -0700
>
> > As diagnosed by Song Liu, ndo_poll_controller() can
> > be very dangerous on loaded hosts, since the cpu
> > calling ndo_poll_controller() might steal all NAPI
> > contexts (for all RX/TX queues of the NIC).
> >
> > This capture, showing one ksoftirqd eating all cycles
> > can last for unlimited amount of time, since one
> > cpu is generally not able to drain all the queues under load.
> >
> > It seems that all networking drivers that do use NAPI
> > for their TX completions, should not provide a ndo_poll_controller() :
> >
> > Most NAPI drivers have netpoll support already handled
> > in core networking stack, since netpoll_poll_dev()
> > uses poll_napi(dev) to iterate through registered
> > NAPI contexts for a device.
>
> I'm having trouble understanding the difference.
>
> If the drivers are processing all of the RX/TX queue draining by hand
> in their ndo_poll_controller() method, how is that different from the
> generic code walking all of the registererd NAPI instances one by one?

(resent in plain text mode this time)

Reading poll_napi() and poll_one_napi() I thought that we were using
NAPI_STATE_NPSVC
and cmpxchg(&napi->poll_owner, -1, cpu) to _temporary_ [1] own each
napi at a time.

But I do see we also have this part at the beginning of poll_one_napi() :

if (!test_bit(NAPI_STATE_SCHED, &napi->state))
      return;

So we probably should remove it. (The normal napi->poll() calls would
not proceed since napi->poll_owner would not be -1)

[1]
While if a cpu succeeds into setting NAPI_STATE_SCHED, it means it has
to own it as long as the
napi->poll() does not call napi_complete_done(), and this can be
forever (the capture effect)

Basically calling napi_schedule() is the dangerous part.

I believe busy_polling and netpoll are the same intruders (as they can
run on arbitrary cpus).
But netpoll is far more problematic since it iterates through all RX/TX queues.

Reply via email to