On 2023-09-14 3:28 AM, Kristof Provost wrote:
On 14 Sep 2023, at 4:54, Xin Li wrote:
Hi!

I recently upgraded my home router and found that there is some regression 
related to pf or IPv6.

When attempting to connect an IPv6 TCP service, process would enter a seemingly 
unkillable state (the stack varies but always begins with write, so it seems 
that tailscale was trying to send some packet to the server), until racoon is 
killed and restarted (at which point the connection would be dropped).

tcpdump over the gif(4) channel captured a lot of seemingly duplicated packets 
like this:

03:40:50.088262 IP6 LOCAL.16275 > REMOTE.443: Flags [.], seq 1619:2947, ack 
4225, win 129, options [nop,nop,TS val 2817088580 ecr 3077807235], length 1328
03:40:50.088332 IP6 LOCAL.16275 > REMOTE.443: Flags [.], seq 1619:2947, ack 
4225, win 129, options [nop,nop,TS val 2817088581 ecr 3077807235], length 1328
[identical except timestamp]
03:40:50.089107 IP6 LOCAL.16275 > REMOTE.443: Flags [.], seq 1619:2947, ack 
4225, win 129, options [nop,nop,TS val 2817088581 ecr 3077807235], length 1328

Am I the only person who is seeing this?  (Admittedly my setup is somewhat 
unique; my home ISP doesn't provide IPv6 service, so I have a gif(4) tunnel to 
my datacenter, which connects to Hurricane Electric's IPv6 tunnel service and 
basically routes my IPv6 traffic to that tunnel.  Earlier I discovered that 
some IPv6 connectivity issues were related to MTU being too big (1480; reduced 
to 1400 now) but the unkillable IPv6 applications was new and only happened on 
14.x)


That doesn’t immediately ring any bells, no.

Are you using route-to anywhere? There’s been a change 
(829a69db855b48ff7e8242b95e193a0783c489d9) that has some potential to affect 
uncommon setups, but right now I’m just guessing.

No.  Actually my IPv6 related rule was quite simple:

pass in quick inet6 proto tcp from <myv6> to any flags S/SA keep state
block in quick inet6 proto tcp from ! <myv6> to <myv6> flags S/SA
block out quick inet6 proto tcp from ! <myv6> to <myv6> flags S/SA

(where myv6 is the /48 prefix from Hurricane Electric tunnel).

The rest of rules were mostly NAT rules that translates my internal IPv4 addresses to the WAN interface IP and a set of rule to block IPs passed from sshguard:

block in quick on $ext_if proto tcp from <sshguard> to any port 22 label "ssh bruteforce"


I’d recommend tcpdump-ing the wan link at the same time as the gif tunnel so 
you can work out if the packets are being dropped locally or remotely. Or you 
can try adding ‘log’ statements to the pf rules and using pflog to figure out 
if/why packets are being dropped.

The gif tunnel's traffic is significantly larger than the WAN port (captured ~2.5GB on gif, and only 69kb on WAN related to the tunnel with 'host <tunnelIP>' as filter), so it's _probably_ something that gets dropped locally.

The change between stable/13 and stable/14 appeared to be quite harmless (it's mostly marius@'s e82d7b2952afaf9625a3d7b05d03c43c1d3e7d9c and I can't think it could caused this).

And as a shoot to the dark, I tried again with IPsec (racoon) disabled, and the issue is gone. My IPsec configuration is fairly common:

===
flush;
spdflush;

spdadd WAN_IP4 REMOTE_IP4 any -P out ipsec esp/transport//require;
spdadd REMOTE_IP4 WAN_IP4 any -P in ipsec esp/transport//require;
===

I'm still comparing the code and reading the history of changes between stable/13 and stable/14 to see if there are something obvious, but more insights from others would be appreciated :)

Cheers,

Attachment: OpenPGP_signature
Description: OpenPGP digital signature

Reply via email to