On 5/26/23 08:37, Ales Musil wrote:
> On Fri, May 26, 2023 at 7:58 AM Han Zhou <hz...@ovn.org> wrote:
> 
>>
>>
>> On Thu, May 25, 2023 at 9:19 AM Ilya Maximets <i.maxim...@ovn.org> wrote:
>>>
>>> On 5/25/23 14:08, Ales Musil via discuss wrote:
>>>> Hi,
>>>>
>>>> to improve the MAC binding aging mechanism we need a way to ensure
>> that rows which are still in use are preserved. This doesn't happen with
>> current implementation.
>>>>
>>>> I propose the following solution which should solve the issue, any
>> questions or comments are welcome. If there isn't anything major that would
>> block this approach I would start to implement it so it can be available on
>> 23.09.
>>>>
>>>> For the approach itself:
>>>>
>>>> Add "mac_cache_use" action into "lr_in_learn_neighbor" table (only the
>> flow that continues on known MAC binding):
>>>> match=(REGBIT_LOOKUP_NEIGHBOR_RESULT == 1 ||
>> REGBIT_LOOKUP_NEIGHBOR_IP_RESULT == 0), action=(next;)  ->
>> match=(REGBIT_LOOKUP_NEIGHBOR_RESULT == 1 ||
>> REGBIT_LOOKUP_NEIGHBOR_IP_RESULT == 0), action=(mac_cache_use; next;)
>>>>
>>>> The "mac_cache_use" would translate to resubmit into separate table
>> with flows per MAC binding as follows:
>>>> match=(ip.src=<MB_IP>, eth.src=<MB_MAC>, datapath=<MB_Datapath>),
>> action=(drop;)
>>
>> It is possible that some workload has heavy traffic for ingress direction
>> only, such as some UDP streams, but not sending anything out for a long
>> interval. So I am not sure if using "src" only would be sufficient.
>>
> 
> Using table 67 (for the "dst" check) still has the black hole problem. I'm
> not sure if there is a universal solution that would satisfy UDP traffic
> that goes only one way, no matter which.
> Also the negative effect of re-ARPing on UDP traffic should be a lot
> smaller than connection protocols.
> 
> 

I agree, it's not sufficient for all cases but I don't think anything
works for the uni-directional traffic scenario.  We need to be
conservative and re-ARP in that case.  So Ales' suggestion sounds good
to me.

>>
>>>
>>> One concern here would be that it will likely cause a packet clone
>>> in the datapath just to immediately drop it.  So, might have a
>>> noticeable performance impact.
>>>
>> +1. We need to be careful to avoid any dataplane performance impact, which
>> doesn't sound justified for the value.
>>
> 
> We can test this out if there is a clone or not, if so we might need to
> include it as a regular table in the pipeline.
> 
> 

A test should clarify whether there will be a clone() or not.

>>
>>>>
>>>> This should bump the statistics every time for the correct MAC
>> binding. In ovn-controller we could periodically dump the flows from this
>> table. the period would be set to MIN(mac_binding_age_threshold/2) from all
>> local datapaths. The dump would happen from a different thread with its own
>> rconn to prevent backlogging issues. The thread would receive mapped data
>> from I-P node that would keep track of mapping datapath -> cookies -> mac
>> bindings. This allows us to avoid constant lookups, but at the cost of
>> keeping track of all local MAC bindings. To save some computation time this
>> I-P could be relevant only for datapaths that actually have the threshold
>> set.
>>>>
>>>> If the "idle_age" of the particular flow is smaller than the datapath
>> "mac_binding_age_threshold" it means that it is still in use. To prevent a
>> lot of updates, if the traffic is still relevant on multiple controllers,
>> we would check if the timestamp is older than the "dump period"; if not we
>> don't have to update it, because someone else did.
>>
>> Thanks for trying to reduce the number of updates to SB DB, but I still
>> have some concerns for this. In theory, to prevent the records being
>> deleted while it is still used, at least one timestamp update is required
>> per threshold for each record. Even if we bundle the updates from each
>> node, assume that the workloads that own the IP/MAC of the mac_binding
>> records are distributed across 1000 nodes, and the aging threshold is 30s,
>> there will still be ~30 updates/s (if we can evenly distribute the updates
>> from different nodes). That's still a lot, which may keep the SB server and
>> all the ovn-controller busy just for these messages. If the aging threshold
>> is set to 300s (5min), it may look better: ~3updates/s, but this still
>> could contribute to the major part of the SB <-> ovn-controller messages,
>> e.g. in ovn-k8s deployment the cluster LR is distributed on all nodes so
>> all nodes would need to monitor all mac-binding timestamp updates related
>> to the cluster LR, which means all mac-binding updates from all nodes. In
>> reality the amount of messages may be doubled if we use the proposed
>> dump-and-check interval mac_binding_age_threshold/2.
>>
> 
> Arguably the 30 second threshold is way too low. You are right that the
> double update might happen in some cases, that can still be mitigated by
> shifting the check of timestamp older than 3/4 of the threshold for
> example, that should still give us enough time to perform the update, but
> it will greatly reduce the chance of doing the update twice.
> 

Should we add some recommendations for "good" aging timeout values in
ovn-nb.xml?

> One thing to keep in mind is that these updates would happen only for MAC
> bindings that have the aging enabled. For some parts of the cluster that
> are fixed enough, or the MAC binding is growing very slowly it might not
> make sense to enable it, if so it might be with a pretty big threshold e.g.
> 3000s.
> 
> 
>>
>> So, I'd evaluate both the dataplane and control plane cost before going
>> for formal implementation.
>>
> 
> The control plane impact is really configuration dependent which is fine
> IMO. If we document the possible outcome I think it should be safe. I will
> for sure try to minimize it as much as possible by doing the mentioned
> checks/optimizations.
> 
> 

As long as with the default configuration (aging disabled) we don't have
any control/data plane impact I think we're OK.  I do hope that there's
no significant data plane impact with aging enabled either though.

>>
>> Thanks,
>> Han
>>
>>>
>>>>
>>>> Also to "desync" the controllers there would be a random delay added
>> to the "dump period".
>>>>
>>>> All of this would be applicable to FDB aging as well.
>>>>
>>>> Does that sound reasonable?
>>>> Please let me know if you have any comments/suggestions.
>>>>
>>>> Thanks,
>>>> Ales
>>>
>>

Thanks,
Dumitru

_______________________________________________
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Reply via email to