On 8/2/20 8:49 AM, Ido Schimmel wrote: > On Thu, Jun 11, 2020 at 10:36:59PM -0600, David Ahern wrote: >> On 6/11/20 6:32 PM, Yi Yang (杨燚)-云服务集团 wrote: >>> David, thank you so much for confirming it can't, I did read your cumulus >>> document before, resilient hashing is ok for next hop remove, but it still >>> has the same issue there if add new next hop. I know most of kernel code in >>> Cumulus Linux has been in upstream kernel, I'm wondering why you didn't >>> push resilient hashing to upstream kernel. >>> >>> I think consistent hashing is must-have for a commercial load balancing >>> solution, otherwise it is basically nonsense , do you Cumulus Linux have >>> consistent hashing solution? >>> >>> Is "- replacing nexthop entries as LB's come and go" ithe stuff >>> https://docs.cumulusnetworks.com/cumulus-linux/Layer-3/Equal-Cost-Multipath-Load-Sharing-Hardware-ECMP/#resilient-hashing >>> is showing? It can't ensure the flow is distributed to the right backend >>> server if a new next hop is added. >> >> I do not believe it is a problem to be solved in the kernel. >> >> If you follow the *intent* of the Cumulus document: what is the maximum >> number of load balancers you expect to have? 16? 32? 64? Define an ECMP >> route with that number of nexthops and fill in the weighting that meets >> your needs. When an LB is added or removed, you decide what the new set >> of paths is that maintains N-total paths with the distribution that >> meets your needs. > > I recently started looking into consistent hashing and I wonder if it > can be done with the new nexthop API while keeping all the logic in user > space (e.g., FRR). > > The only extension that might be required from the kernel is a new > nexthop attribute that indicates when a nexthop was last recently used.
The only potential problem that comes to mind is that a nexthop can be used by multiple prefixes. But, I'm not sure I follow what the last recently used indicator gives you for maintaining flows as a group is updated. > User space can then use it to understand which nexthops to replace when > a new nexthop is added and when to perform the replacement. In case the > nexthops are offloaded, it is possible for the driver to periodically > update the nexthop code about their activity. > > Below is a script that demonstrates the concept with the example in the > Cumulus documentation. I chose to replace the individual nexthops > instead of creating new ones and then replacing the group. That is one of the features ... a group points to individual nexthops and those can be atomically updated without affecting the group. > > It is obviously possible to create larger groups to reduce the impact on > existing flows when a new nexthop is added. > > WDYT? This is inline with my earlier responses, and your script shows an example of how to manage it. Combine it with the active-backup patch set and you handle device events too (avoid disrupting size of the group on device events).