On Tue, Nov 14, 2017 at 01:59:03PM -0800, Shaohua Li wrote: > On Tue, Nov 14, 2017 at 11:13:10AM -0800, Tom Herbert wrote: > > On Tue, Nov 14, 2017 at 10:24 AM, Shaohua Li <s...@kernel.org> wrote: > > > On Wed, Nov 08, 2017 at 09:44:51AM -0800, Tom Herbert wrote: > > >> On Fri, Aug 18, 2017 at 3:27 PM, David Miller <da...@davemloft.net> > > >> wrote: > > >> > From: Martin KaFai Lau <ka...@fb.com> > > >> > Date: Fri, 18 Aug 2017 13:51:36 -0700 > > >> > > > >> >> It seems like that middle box specifically drops TCP_RST if it > > >> >> does not know anything about this flow. Since the flowlabel of the > > >> >> TCP_RST > > >> >> (sent in TW state) is always different, it always lands to a > > >> >> different middle > > >> >> box. All of these TCP_RST cannot be delivered. > > >> > > > >> > This really is illegal behavior. The flow label is not a flow _KEY_ > > >> > by any definition whatsoever. > > >> > > > >> > Flow labels are an optimization, not a determinant for flow matching > > >> > particularly for proper TCP state processing. > > >> > > > >> > I'd rather you invest all of this energy getting that vendor to fix > > >> > their kit. > > >> > > > >> We're now seeing several router vendors recommending people to not use > > >> flow labels for ECMP hashing. This is precisely because when a flow > > >> label changes, network devices that maintain state (firewalls, NAT, > > >> load balancers) can't deal with packets being rerouted so connections > > >> are dropped. Unfortunately, the need for packets of a flow to always > > >> follow the same path has become an implicit requirement that I think > > >> we need follow at least as the default behavior. > > >> > > >> Martin: is there any change you could resurrect these patches? In > > >> order to solve the general problem of making routing consistent, I > > >> believe we want to keep sk_tx_hash consistent for the connection from > > >> which a consistent flow label can be derived. To avoid the overhead of > > >> a hash field in sk_common, maybe we could initially set a connection > > >> hash to a five-tuple hash for a flow instead of a random value? So in > > >> TW state the consistent hash can be computed on the fly. > > > > > > Hi Tom, > > > Do we really need to use the five-tupe hash? There are several places > > > using > > > current random hash, which looks more lightweight. To fix issue, we only > > > need > > > to make sure reset packet include the correct flowlabel. Like what my > > > previous > > > patch did, we can set tw->tw_flowlabel in tcp_time_wait based on txhash > > > and use > > > it reset packet. In this way we can use the random hash and not add extra > > > field > > > in sock. > > > > > Shaohua, > > > > But that patch discards the full txhash in TW. So it's not just a > > problem with the flow label. sk_tx_hash can also be used for route > > selection in ECMP, port selection we're doing tunneling, etc. The > > general solution should maintains tx_hash or be able to reconstruct it > > in any state, flow label fix is a point solution. > > Hi Tom, > > do you want to keep sk_rethink_txhash() then? If we changed the hash to random > number, we can't reconstruct it for sure. A followup question on rethink. Does it mean we need a new sysctl (persistent_txhash) to avoid sk_rethink_txhash() together such that it keeps the routing decision consistent (e.g. flowlabel) ?
> > Thanks, > Shaohua