Hi Andi, Andi Kleen <[EMAIL PROTECTED]> writes: > > Please note first that I want to address physical failures by > > the failover-capable network devices, which are increasingly > > becoming important as Xen-based VM systems are getting popular. > > Reducing a single-point-of-failure (physical device) is vital on > > such VM systems. > > Just you typically still have lots of other single points of failures in > a single system, some of them quite less reliable than your typical > NIC. But at least it gives impressive demos when pulling ethernet cables @)
Indeed :-) > > If TCP retransmission misses the time frame between event #1 and > > #3 in Background above (between 20 and 30sec since network > > failure), a failure causes the system-level failover where the > > network-device-level failover should be enough. > > You should probably make sure that the device ends up returning the > right NET_XMIT_* code for such drops to TCP, in particular > NET_XMIT_DROP. This might require slight driver interface > changes. Also right now it only affects the congestion window, I think, > it might be reasonable to let it affect the timer backoff too. Well, I don't think it can be a help. Your suggestion, to utilize NET_XMIT_* code returned from an underlying layer, is done in tcp_transmit_skb. But my problem is that tcp_transmit_skb is not called during a certain period of time. So I'm suggesting to cap RTO value so that tcp_transmit_skb gets called more frequently. Does it make sense, Andi? Regards, -- OBATA Noboru ([EMAIL PROTECTED]) - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html