On Sun, Sep 23, 2018 at 12:29 PM David Miller <da...@davemloft.net> wrote: > > From: Eric Dumazet <eduma...@google.com> > Date: Fri, 21 Sep 2018 15:27:37 -0700 > > > As diagnosed by Song Liu, ndo_poll_controller() can > > be very dangerous on loaded hosts, since the cpu > > calling ndo_poll_controller() might steal all NAPI > > contexts (for all RX/TX queues of the NIC). > > > > This capture, showing one ksoftirqd eating all cycles > > can last for unlimited amount of time, since one > > cpu is generally not able to drain all the queues under load. > > > > It seems that all networking drivers that do use NAPI > > for their TX completions, should not provide a ndo_poll_controller() : > > > > Most NAPI drivers have netpoll support already handled > > in core networking stack, since netpoll_poll_dev() > > uses poll_napi(dev) to iterate through registered > > NAPI contexts for a device. > > I'm having trouble understanding the difference. > > If the drivers are processing all of the RX/TX queue draining by hand > in their ndo_poll_controller() method, how is that different from the > generic code walking all of the registererd NAPI instances one by one?
(resent in plain text mode this time) Reading poll_napi() and poll_one_napi() I thought that we were using NAPI_STATE_NPSVC and cmpxchg(&napi->poll_owner, -1, cpu) to _temporary_ [1] own each napi at a time. But I do see we also have this part at the beginning of poll_one_napi() : if (!test_bit(NAPI_STATE_SCHED, &napi->state)) return; So we probably should remove it. (The normal napi->poll() calls would not proceed since napi->poll_owner would not be -1) [1] While if a cpu succeeds into setting NAPI_STATE_SCHED, it means it has to own it as long as the napi->poll() does not call napi_complete_done(), and this can be forever (the capture effect) Basically calling napi_schedule() is the dangerous part. I believe busy_polling and netpoll are the same intruders (as they can run on arbitrary cpus). But netpoll is far more problematic since it iterates through all RX/TX queues.