On Tue, Jun 25, 2019 at 12:18 PM Jamal Hadi Salim <j...@mojatatu.com> wrote: > > On 2019-06-24 6:13 p.m., John Hurley wrote: > > These patches aim to prevent act_mirred causing stack overflow events from > > recursively calling packet xmit or receive functions. Such events can > > occur with poor TC configuration that causes packets to travel in loops > > within the system. > > > > Florian Westphal advises that a recursion crash and packets looping are > > separate issues and should be treated as such. David Miller futher points > > out that pcpu counters cannot track the precise skb context required to > > detect loops. Hence these patches are not aimed at detecting packet loops, > > rather, preventing stack flows arising from such loops. > > Sigh. So we are still trying to save 2 bits? > John, you said ovs has introduced a similar loop handling code; > Is their approach similar to this? Bigger question: Is this approach > "good enough"? >
Hi Jamal. Yes, OvS implements a similar approach to prevent recursion: https://elixir.bootlin.com/linux/v5.2-rc6/source/net/openvswitch/actions.c#L1530 It was discussed on a previous thread that there are really 2 issues here (even if it is the same thing that triggers them). Firstly, the infinite looping of packets caused by poor config and, secondly, the kernel panic caused by a stack overflow due to the recursion in use. These patches target the latter. I think this approach is good enough to deal with the crashes as it tracks a packets recursive calls (through act_mirred) on the network stack. If the packet is scheduled and releases the CPU then the counter is reset. The packet may still loop but it will not cause stack overflows. > Not to put more work for you, but one suggestion is to use skb metadata > approach which is performance unfriendly (could be argued to more > correct). > > Also: Please consider submitting a test case or two for tdc. > > cheers, > jamal >