Hello, On Fri, Aug 30, 2024 at 08:12:52PM +0000, Jason Healy wrote: > Good afternoon, > > Thank you for your analysis, and my apologies for the slow reply. I have
no problem at all. </snip> > Your first packet capture (icmp.pkt) contained packet-too-big messages like > this: > > > 10.188.210.10 > 10.188.211.123: icmp: echo request (DF) > > 10.188.211.123 > 10.188.210.10: icmp: \ > > 10.188.211.123 unreachable - need to frag (mtu 1480) (DF) > > The source of the packet-too-big is the target of the ping (10.188.211.123) > and not the PF router (10.188.210.50; not explicitly stated but taken from > your topology document). So while a packet-too-big was received, it was not > generated by the router itself (is the link MTU between PF and the test box > greater than 1500?) > The things are more subtle here. The short answer is the ICMP error comes from firewall (PF box). Let me explain what's happening. Referring to topology found in OpenBSD regress [1] we need to look at PF box more closely. in <-> out 10.188.210.50 10.188.211.50 fdd7:e83e:66bc:210:5054:ff:fe12:3450 fdd7:e83e:66bc:211:5054:ff:fe12:3450 the PF acts as a router for 10.188.211.0/24 subnet. There is also rule does AF-translation for packets with 10.188.211.123 destination to fdd7:e83e:66bc:211:5054:ff:fe12:3451 (IPv6 address of RT box) ICMP echo-request to 10.188.211.123 enters PF at in interface. It is inbound packet which matches af-to rule. PF creates state and changes the destination IP address (and family) to fdd7:e83e:66bc:211:5054:ff:fe12:3451. The source address comes from rule. The rule uses IPv6 address of out interface (fdd7:e83e:66bc:211:5054:ff:fe12:3450). the firewall then injects the packet to IP stack to route it, line 8004 here: 7984 ip_forward(pd.m, ifp, NULL, flags); 7985 } else 7986 ip_output(pd.m, NULL, NULL, 0, NULL, NULL, 0); 7987 break; 7988 case AF_INET6: 7989 if (pd.dir == PF_IN) { 7990 int flags = IPV6_REDIRECT; 7991 7992 switch (atomic_load_int(&ip6_forwarding)) { 7993 case 2: 7994 SET(flags, IPV6_FORWARDING_IPSEC); 7995 /* FALLTHROUGH */ 7996 case 1: 7997 SET(flags, IPV6_FORWARDING); 7998 break; 7999 default: 8000 ip6stat_inc(ip6s_cantforward); 8001 action = PF_DROP; 8002 goto out; 8003 } 8004 ip6_forward(pd.m, NULL, flags); 8005 } else 8006 ip6_output(pd.m, NULL, NULL, 0, NULL, NULL); 8007 break; note: call to ip6_forward() happens for AF-translated packets only. pf_test() caller sees packet as discarded. ip6_forward() finds route (xmite interface) for packet with destination fdd7:e83e:66bc:211:5054:ff:fe12:3451. As it attempts to transmit the packet it finds out the packet does not fit to wire. It uses the icmp6_error()/icmp6_reflect() to generate ICMPv6 error. The ICMPv6 packet looks like this: fdd7:e83e:66bc:211:5054:ff:fe12:3451 -> fdd7:e83e:66bc:211:5054:ff:fe12:3450 (IPv6 address of in @ RT) (IPv6 address of out @ PF) The packet is injected to IP stack on PF box by task. Firewall intercepts such packet as outbound. The first thing firewall does it tries to find matching state. It finds the state created by ICMP request. Found state also indicates packet needs to be translated back to IPv4. The address 10.188.211.123 comes from state. I agree the translation can be more smart here because the mbuf which represents packet should still keep PF_TAG_GENERATED flag. Firewall can use it to distinguish local ICMP errors from those which come from remote boxes. If PF_TAG_GENERATED flag is present then firewall should pick up source address from local interface instead of using IPv4 address found in state. But this is a detail because for remote hosts firewall would have to use IPv4 address which comes state anyway. [1] https://github.com/openbsd/src/blob/master/regress/sys/net/pf_forward/Makefile > Your second packet capture (icmp-eco.pkt) contained packet-too-big messages > like this: > > > 10.188.210.10 > 10.188.212.52: icmp: echo request (DF) > > 10.188.211.51 > 10.188.210.10: icmp: \ > > 10.188.212.52 unreachable - need to frag (mtu 1300) > > Here again we have received a packet-too-big message, but it wasn't generated > by the PF box but rather by RT (which has the 1300 MTU). I'm interested in > the case where the PF box is where the MTU shift occurs (due to the larger > headers of IPv6), and so it must generate the error rather than just > forwarding one from upstream. see above. the test I've done indeed covers the scenario where MTU is too small for IPv6 packet. </snip> If too-big error will be coming from host RT and error will will match state created by af-to rule, then firewall will use IPv4 address from state (a.k.a. destination IP from packet sent by client). In that case client will see too-big error as coming from destination host. looks odd but there is nothing we can do about it. > > > > Independent of this, I wanted to provide some additional information about my > environment, as it is not as simple as the test environment. Our setup makes > use of rdomains, which I did not include in the original ticket, but realize > now does make for a different setup. I'll try to define the topology: > > PF box has three physical interfaces in use: > > em0 (member of trunk0) > em1 (member of trunk0) > em2 (management interface) > > em0/em1 are bonded using trunk(4) into interface trunk0 > > trunk0 is connected to an upstream switch with tagged VLANs enabled, so we > create vlan(4) interfaces on top of trunk0 > > For the purposes of this bug, we will deal with a single vlan, vlan42 > > vlan42 has both an IPv4 and IPv6 address assigned to it. Our intent is to > use it as a kind of "hairpin CLAT"; IPv4 packets are received on vlan42's > IPv4 interface, af-translated by PF, and emitted back on vlan42 as IPv6 > packets. The default router on vlan42 (not managed by OpenBSD) will forward > the packets to our NAT64 box and eventual delivery. > > To isolate VLANs from each other, each vlan interface is put into its own > rtable (42, in this case). > > Under this setup, we do not see any ICMPv4 packet-too-big messages. We have > attempted packet captures on both vlan42 and on em2 (management), but have > not seen any ICMP codes. We even went so far as to add a IPv4 address and > default route to em2 in case PF was sending them via the default rtable > instead of the one assigned to the incoming interface, but nothing there > either. > > We have a "pass out" default rule in our pf.conf, so I do not believe we are > preventing any generated packets from leaving the box. > > I can provide full PF rules and network topology if necessary, but I think we > should debug your test network case first to see if PF actually will generate > a packet-too-big message before we move on to anything more complicated. > I understand what are you trying to do. I'm afraid I will need output of ifconfig and pf.conf. Taking a brief look into your source code I can see icmp6_reflect() is aware of routing domains, so I would assume things should work too. Also if you can add some network diagram with icmp packet which should generates the too-big error. The ICMP packet should come with IP addresses so I can better reason about pf behavior for your set up. thanks and regards sashan