On 27. 11. 18 23:51, David Miller wrote:
From: Petr Oros <po...@redhat.com>
Date: Thu, 22 Nov 2018 15:24:07 +0100
@@ -4700,8 +4700,11 @@ int be_update_queues(struct be_adapter *adapter)
struct net_device *netdev = adapter->netdev;
int status;
- if (netif_running(netdev))
+ if (netif_running(netdev)) {
+ /* prevent netdev watchdog during tx queue destroy */
+ netif_carrier_off(netdev);
be_close(netdev);
+ }
This will make userspace networking daemons will think that the link
went down.
This absolutely should not be a side effect of making a simple
TX queue configuration change via ethtool.
Yes, you're right Dave. But the same thing (netif_carrier_off()) is done by
number of other drivers (igb, tg3, ixgbe...) during .set_channels() callback.
The patch that Petr sent does the practically the same thing like this one:
commit c0dfb90e5b2d41c907de9b624657a6688541837e
Author: John Fastabend <john.r.fastab...@intel.com>
Date: Tue Apr 27 02:13:39 2010 +0000
ixgbe: ixgbe_down needs to stop dev_watchdog
There is a small race between when the tx queues are stopped
and when netif_carrier_off() is called in ixgbe_down. If the
dev_watchdog() timer fires during this time it is possible for
a false tx timeout to occur.
This patch moves the netif_carrier_off() so that it is called before
the tx queues are stopped preventing the dev_watchdog timer from
detecting false tx timeouts. The race is seen occosionally when
FCoE or DCB settings are being configured or changed.
Testing note, running ifconfig up/down will not reproduce this
issue because dev_open/dev_close call dev_deactivate() and then
dev_activate().
where netif_carrier_off() is called prior netif_tx_disable() from ixgbe_down()
to avoid false Tx timeout. And ixgbe_down() is called from ixgbe_set_channels()
this way:
ixgbe_set_channels()->ixgbe_setup_tc()->ixgbe_close()->ixgbe_close_suspend()->ixgbe_down()
As I said the similar approach is used by other drivers as well.
The mlx4 driver resolves this situation differently. It calls
mlx4_en_stop_port() from mlx4_en_set_channels() with parameter 'detach'==1 that
causes that netif_device_detach() is called prior stopping of Tx queues. This
also effectively prevents dev_watchdog() to call .ndo_tx_timeout() callback. An
advantage is netif_device_detach() does not fire linkwatch events.
So... what ways is the _right_ one ?
Thanks,
Ivan