[EMAIL PROTECTED] writes: > Please note first that I want to address physical failures by > the failover-capable network devices, which are increasingly > becoming important as Xen-based VM systems are getting popular. > Reducing a single-point-of-failure (physical device) is vital on > such VM systems.
Just you typically still have lots of other single points of failures in a single system, some of them quite less reliable than your typical NIC. But at least it gives impressive demos when pulling ethernet cables @) > 1. Network device layer detects a failure first and switch to a > backup device (say, in 20sec). > > 2. TCP layer timeout & retransmission comes next, _hopefully_ > before the application layer timeout. > > 3. Application layer detects a network failure last (by, say, > 30sec timeout) and may trigger a system-level failover. > > It should be noted that the timeouts for #1 and #2 are handled > independently and there is no relationship between them. > If TCP retransmission misses the time frame between event #1 and > #3 in Background above (between 20 and 30sec since network > failure), a failure causes the system-level failover where the > network-device-level failover should be enough. You should probably make sure that the device ends up returning the right NET_XMIT_* code for such drops to TCP, in particular NET_XMIT_DROP. This might require slight driver interface changes. Also right now it only affects the congestion window, I think, it might be reasonable to let it affect the timer backoff too. -Andi - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html