On Mon, Apr 27, 2015 at 10:45 PM, Scott Feldman <sfel...@gmail.com> wrote: > On Mon, Apr 27, 2015 at 10:38 AM, <anurad...@cumulusnetworks.com> wrote: >> From: Anuradha Karuppiah <anurad...@cumulusnetworks.com> >> >> User space daemons can detect errors in the network that need to be >> notified to the switch device drivers. >> >> Drivers can react to this error state by doing a phy-down on the >> switch-port which would result in a carrier-off locally and on the >> directly connected switch. Doing that would prevent loops and >> black-holes in the network. > > (Sorry if this was asked earlier) > > Can the application simply send a SETLINK with IFF_UP clear and the > port driver's ndo_stop would bring the PHY link down?
(Re-sending as plain text) - Yes, Clearing IFF_UP on detecting errors (PROTO_DOWN) is possible and we tried that implementation as well. Unfortunately it failed because of the following reasons - 1. There is no way to disambiguate between admin_down (!IFF_UP) and an APP/driver enforced error_down (IFF_PROTO_DOWN). Administrator or automatation-scripts that monitor the config assumed that switch-port configuration had somehow fallen out of sync (and attempted to reinstate the admin_up repeatedly). 2. Automatic error recovery was not possible; consider the following scenario for e.g. a. The MLAG peer-link is down so the MLAG app on the secondary switch has proto_down’ed all the MLAG ports (including switch-port swp1) by clearing IFF_UP. b. At the same time the administrator is in the process of making some changes on the network connected to swp1. To avoid doing it live he would admin_disable swp1 (!IFF_UP) by doing an "ip link set swp1 down" (this is a no-op as event #a has already cleared IFF_UP on swp1). c. If the MLAG peer-link recovers at this point the MLAG app on the secondary switch would try to automatically recover the MLAG ports by clearing proto_down (i.e. setting IFF_UP); including on swp1. Doing that overrides the administrator’s directive to keep swp1 admin_down. Overriding an admin-down in a live network can be very dangerous so it is not possible to do auto-error-recovery unless we have a way to disambiguate between the admin and error states. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html