This is an implementation of the idea provided by Jakub here
https://lore.kernel.org/netdev/[email protected]/
ndo_set_rx_mode is problematic because it cannot sleep.
To address this, this series proposes dividing the concept of setting
rx_mode into 2 stages: snapshot and deferred I/O. To achieve this, we
reinterpret set_rx_mode and add create a new ndo write_rx_mode as
explained below:
The new set_rx_mode will be responsible for customizing the rx_mode
snapshot which will be used by write_rx_mode to update the hardware
In brief, the new flow looks something like:
prepare_rx_mode():
ndo_set_rx_mode();
prepare_snapshot();
write_rx_mode():
use_ready_snapshot();
ndo_write_rx_mode();
write_rx_mode() is called from a work item and doesn't hold the
netif_addr_lock lock during ndo_write_rx_mode() making it sleepable
in that section.
This model should work correctly if the following conditions hold:
1. write_rx_mode should use the rx_mode set by the most recent
call to make_snapshot_ready before its execution.
2. If a make_snapshot_ready call happens during execution of write_rx_mode,
write_rx_mode should be rescheduled.
3. All calls to modify rx_mode should pass through the prepare_rx_mode +
schedule write_rx_mode execution flow. netif_rx_mode_schedule_work
has been implemented in core for this purpose.
1 and 2 are implemented in core
Drivers need to ensure 3 using netif_rx_mode_schedule_work
To use this model, a driver needs to implement the
ndo_write_rx_mode callback, change the set_rx_mode callback
appropriately and replace all calls to modify rx mode with
netif_rx_mode_schedule_work
Signed-off-by: I Viswanath <[email protected]>
---
In v5, apart from the bug with netif_rx_mode_flush_work, this line of code in
netif_free_rx_mode_ctx
was problematic:
cancel_work_sync(&dev->rx_mode_ctx->rx_mode_work);
The problem was this function ran as part of dev_close() and hence the RTNL
lock is held while it is waiting
for netif_rx_mode_write_active() which needs to grab RTNL lock.
If the work function was scheduled before a call to dev_close(), we are
guaranteed a deadlock.
The solution to this is cancelling the work in a context that doesn't hold the
RTNL lock. The only existing
function in the teardown path that did this was free_netdev and it isn't ideal
to do the cleanup there.
My solution was to introduce a new struct netif_deferred_work_cleanup and a new
net_device member
deferred_work_cleanup.
deferred_work_cleanup will be a work item (along with a ptr to dev) scheduled
by dev_close() that
will execute the cleanup functions that require the RTNL lock to not be held
v1:
Link:
https://lore.kernel.org/netdev/[email protected]/
v2:
- Exported set_and_schedule_rx_config as a symbol for use in modules
- Fixed incorrect cleanup for the case of rx_work alloc failing in
alloc_netdev_mqs
- Removed the locked version (cp_set_rx_mode) and renamed __cp_set_rx_mode to
cp_set_rx_mode
Link:
https://lore.kernel.org/netdev/[email protected]/
v3:
- Added RFT tag
- Corrected mangled patch
Link:
https://lore.kernel.org/netdev/[email protected]/
v4:
- Completely reworked the snapshot mechanism as per v3 comments
- Implemented the callback for virtio-net instead of 8139cp driver
- Removed RFC tag
Link:
https://lore.kernel.org/netdev/[email protected]/
v5:
- Fix broken code and titles
- Remove RFT tag
Link:
https://lore.kernel.org/netdev/[email protected]/
v6:
- Added struct netif_deferred_work_cleanup and members needs_deferred_cleanup
and deferred_work_cleanup in net_device
- Moved out ctrl bits from netif_rx_mode_config to netif_rx_mode_work_ctx
I Viswanath (2):
net: refactor set_rx_mode into snapshot and deferred I/O
virtio-net: Implement ndo_write_rx_mode callback
drivers/net/virtio_net.c | 55 +++-----
include/linux/netdevice.h | 113 +++++++++++++++-
net/core/dev.c | 270 +++++++++++++++++++++++++++++++++++++-
3 files changed, 396 insertions(+), 42 deletions(-)
--
2.47.3