On Tue, Nov 14, 2017 at 01:59:03PM -0800, Shaohua Li wrote:
> On Tue, Nov 14, 2017 at 11:13:10AM -0800, Tom Herbert wrote:
> > On Tue, Nov 14, 2017 at 10:24 AM, Shaohua Li <s...@kernel.org> wrote:
> > > On Wed, Nov 08, 2017 at 09:44:51AM -0800, Tom Herbert wrote:
> > >> On Fri, Aug 18, 2017 at 3:27 PM, David Miller <da...@davemloft.net> 
> > >> wrote:
> > >> > From: Martin KaFai Lau <ka...@fb.com>
> > >> > Date: Fri, 18 Aug 2017 13:51:36 -0700
> > >> >
> > >> >> It seems like that middle box specifically drops TCP_RST if it
> > >> >> does not know anything about this flow.  Since the flowlabel of the 
> > >> >> TCP_RST
> > >> >> (sent in TW state) is always different, it always lands to a 
> > >> >> different middle
> > >> >> box.  All of these TCP_RST cannot be delivered.
> > >> >
> > >> > This really is illegal behavior.  The flow label is not a flow _KEY_
> > >> > by any definition whatsoever.
> > >> >
> > >> > Flow labels are an optimization, not a determinant for flow matching
> > >> > particularly for proper TCP state processing.
> > >> >
> > >> > I'd rather you invest all of this energy getting that vendor to fix
> > >> > their kit.
> > >> >
> > >> We're now seeing several router vendors recommending people to not use
> > >> flow labels for ECMP hashing. This is precisely because when a flow
> > >> label changes, network devices that maintain state (firewalls, NAT,
> > >> load balancers) can't deal with packets being rerouted so connections
> > >> are dropped. Unfortunately, the need for packets of a flow to always
> > >> follow the same path has become an implicit requirement that I think
> > >> we need follow at least as the default behavior.
> > >>
> > >> Martin: is there any change you could resurrect these patches? In
> > >> order to solve the general problem of making routing consistent, I
> > >> believe we want to keep sk_tx_hash consistent for the connection from
> > >> which a consistent flow label can be derived. To avoid the overhead of
> > >> a hash field in sk_common, maybe we could initially set a connection
> > >> hash to a five-tuple hash for a flow instead of a random value? So in
> > >> TW state the consistent hash can be computed on the fly.
> > >
> > > Hi Tom,
> > > Do we really need to use the five-tupe hash? There are several places 
> > > using
> > > current random hash, which looks more lightweight. To fix issue, we only 
> > > need
> > > to make sure reset packet include the correct flowlabel. Like what my 
> > > previous
> > > patch did, we can set tw->tw_flowlabel in tcp_time_wait based on txhash 
> > > and use
> > > it reset packet. In this way we can use the random hash and not add extra 
> > > field
> > > in sock.
> > >
> > Shaohua,
> > 
> > But that patch discards the full txhash in TW. So it's not just a
> > problem with the flow label. sk_tx_hash can also be used for route
> > selection in ECMP, port selection we're doing tunneling, etc. The
> > general solution should maintains tx_hash or be able to reconstruct it
> > in any state, flow label fix is a point solution.
> 
> Hi Tom,
> 
> do you want to keep sk_rethink_txhash() then? If we changed the hash to random
> number, we can't reconstruct it for sure.
A followup question on rethink.  Does it mean we need
a new sysctl (persistent_txhash) to avoid sk_rethink_txhash() together
such that it keeps the routing decision consistent (e.g. flowlabel) ?

> 
> Thanks,
> Shaohua

Reply via email to