Hi Roi,

On Sun, Jan 31, 2021 at 03:18:34PM +0200, Roi Dayan wrote:
[...]
> Hi Pablo,
> 
> We did more tests with just updating the timeout in the 2 callers
> and it's not enough. We reproduce the issue of rules being timed
> out just now frim different place.

Thanks for giving it a try to my suggestion, it was not correct.

> There is a 3rd caller nf_ct_gc_expired() which being called by 3
> other callers:
> ____nf_conntrack_find()
> nf_conntrack_tuple_taken()
> early_drop_list()

Hm. I'm not sure yet what path is triggering this bug.

Florian came up with the idea of setting a very large timeout for
offloaded flows (that are refreshed by the garbage collector) to avoid
the extra check from the packet path, so those 3 functions above never
hit the garbage collection path. This also applies for the ctnetlink
(conntrack -L) and the /proc/net/nf_conntrack sysctl paths that the
patch describes, those should not ever see an offloaded flow with a
small timeout.

nf_ct_offload_timeout() is called from:

#1 flow_offload_add() to set a very large timer.
#2 the garbage collector path, to refresh the timeout the very large
   offload timer.

Probably there is a race between setting the IPS_OFFLOAD and when
flow_offload_add() is called? Garbage collector gets in between and
zaps the connection. Is a newly offloaded connection that you observed
that is being removed?

> only early_drop_list() has a check to skip conns with offload bit
> but without extending the timeout.
> I didnt do a dump but the issue could be from the other 2 calls.
> 
> With current commit as is I didn't need to check more callers as I made
> sure all callers will skip the non-offload gc.
>
> Instead of updating more callers and there might be more callers
> later why current commit is not enough?
> We skip offloaded flows and soon gc_worker() will hit and will update
> the timeout anyway.

Another possibility would be to check for the offload bit from
nf_ct_is_expired(), which is coming slighty before nf_ct_should_gc().
But this is also in the ____nf_conntrack_find() path.

Florian?

Reply via email to