On Wed, Jan 24, 2018 at 10:59:21AM +0100, Steffen Klassert wrote: > On Fri, Jan 19, 2018 at 03:45:46PM +0100, Tobias Hommel wrote: > > > > I tried to strip down the system configuration and was able to reproduce the > > problem with a minimal configuration: > > * ipsets are not used anymore > > * no firewall markings are used any longer > > * iptables are "completely empty", i.e. all policies set to ACCEPT and > > there is > > no rule in any table > > * no additional routing policies (ip rule) except the default ones > > * only main routing table is used > > * using a "minimal" kernel config: > > * run `make defconfig` > > * add basic things (ESP, IGB driver, some crypto algorithms) > > * add options required to boot up the system (TPM crypt, some device mapper > > options, overlayfs) > > > > I attached the minimal config (minimal.config) and the defconfig for > > reference > > (minimal.defconfig). > > > > The setup is really simple now, the gateway is forwarding HTTP connections > > between eth1(IPSec tunnels) and eth0 without any firewall, NAT, whatsoever. > > Thanks a lot for your debugging effort! > > > > > The only thing I can think of are the rather aggressive roadwarrior clients. > > There are 750 roadwarriors that are controlled by a script which starts and > > stops the IPSec connection. > > I still can't reproduce it with my tests. This is probably some race > triggered due to your aggressive roadwarrior setup which I don't have. > > > I tried 4.15-rc8 and have the same problem here (see attached > > kernel-4.15-rc8.log). SMP affinity for IRQs has changed in 4.15 and > > something's > > There is one patch that could influence this which is not in v4.15-rc8: > > commit 76a4201191814a0061cb5c861fafb9ecaa764846 > ("xfrm: Fix a race in the xdst pcpu cache.") > > It is included in v4.15-rc9. I already tested that one some weeks ago, when it appeared on the mailing list, with 4.14. Without any luck.
> > If this does not fix your problem, I'm out of ideas. In this case > I have to ask to do a bisection to find the offending commit. > I'll do a bisect session then. It'll take some time though as the hardware is currently occupied with other tests. I'll keep you up-to-date about the results.