Re: pf af-to silently dropping packets that are too big after translation

Alexandr Nedvedicky Mon, 02 Sep 2024 08:49:02 -0700

Hello,

On Fri, Aug 30, 2024 at 08:12:52PM +0000, Jason Healy wrote:
> Good afternoon,
>
> Thank you for your analysis, and my apologies for the slow reply.  I have


    no problem at all.

</snip>
> Your first packet capture (icmp.pkt) contained packet-too-big messages like 
> this:
>
> > 10.188.210.10 > 10.188.211.123: icmp: echo request (DF)
> > 10.188.211.123 > 10.188.210.10: icmp: \
> >     10.188.211.123 unreachable - need to frag (mtu 1480) (DF)
>
> The source of the packet-too-big is the target of the ping (10.188.211.123)
> and not the PF router (10.188.210.50; not explicitly stated but taken from
> your topology document).  So while a packet-too-big was received, it was not
> generated by the router itself (is the link MTU between PF and the test box
> greater than 1500?)
>

    The things are more subtle here. The short answer is the ICMP error comes 
from
    firewall (PF box). Let me explain what's happening. Referring to topology
    found in OpenBSD regress [1] we need to look at PF box more closely.

        in                                      <->     out
        10.188.210.50                                   10.188.211.50
        fdd7:e83e:66bc:210:5054:ff:fe12:3450            
fdd7:e83e:66bc:211:5054:ff:fe12:3450

    the PF acts as a router for 10.188.211.0/24 subnet. There is also
    rule does AF-translation for packets with 10.188.211.123 destination
    to fdd7:e83e:66bc:211:5054:ff:fe12:3451 (IPv6 address of RT box)

    ICMP echo-request to 10.188.211.123 enters PF at in interface.
    It is inbound packet which matches af-to rule. PF creates state and
    changes the destination IP address (and family) to 
fdd7:e83e:66bc:211:5054:ff:fe12:3451.
    The source address comes from rule. The rule uses IPv6 address of out
    interface (fdd7:e83e:66bc:211:5054:ff:fe12:3450).
    the firewall then injects the packet to IP stack to route it, line 8004 
here:

7984                                 ip_forward(pd.m, ifp, NULL, flags);
7985                         } else
7986                                 ip_output(pd.m, NULL, NULL, 0, NULL, NULL, 
0);
7987                         break;
7988                 case AF_INET6:
7989                         if (pd.dir == PF_IN) {
7990                                 int flags = IPV6_REDIRECT;
7991
7992                                 switch (atomic_load_int(&ip6_forwarding)) {
7993                                 case 2:
7994                                         SET(flags, IPV6_FORWARDING_IPSEC);
7995                                         /* FALLTHROUGH */
7996                                 case 1:
7997                                         SET(flags, IPV6_FORWARDING);
7998                                         break;
7999                                 default:
8000                                         ip6stat_inc(ip6s_cantforward);
8001                                         action = PF_DROP;
8002                                         goto out;
8003                                 }
8004                                 ip6_forward(pd.m, NULL, flags);
8005                         } else
8006                                 ip6_output(pd.m, NULL, NULL, 0, NULL, 
NULL);
8007                         break;

    note: call to ip6_forward() happens for AF-translated packets only.
    pf_test() caller sees packet as discarded.

    ip6_forward() finds route (xmite interface) for packet with destination
    fdd7:e83e:66bc:211:5054:ff:fe12:3451. As it attempts to transmit the packet
    it finds out the packet does not fit to wire.
    It uses the icmp6_error()/icmp6_reflect() to generate ICMPv6 error.
    The ICMPv6 packet looks like this:
        fdd7:e83e:66bc:211:5054:ff:fe12:3451 -> 
fdd7:e83e:66bc:211:5054:ff:fe12:3450
        (IPv6 address of in @ RT)               (IPv6 address of out @ PF)
    The packet is injected to IP stack on PF box by task. Firewall intercepts
    such packet as outbound. The first thing firewall does it tries to find
    matching state. It finds the state created by ICMP request. Found state also
    indicates packet needs to be translated back to IPv4.
    The address 10.188.211.123 comes from state.

    I agree the translation can be more smart here because the mbuf which
    represents packet should still keep PF_TAG_GENERATED flag. Firewall
    can use it to distinguish local ICMP errors from those which come
    from remote boxes. If PF_TAG_GENERATED flag is present then firewall
    should pick up source address from local interface instead of using
    IPv4 address found in state. But this is a detail because for
    remote hosts firewall would have to use IPv4 address which comes
    state anyway.



[1] 
https://github.com/openbsd/src/blob/master/regress/sys/net/pf_forward/Makefile

> Your second packet capture (icmp-eco.pkt) contained packet-too-big messages 
> like this:
>
> > 10.188.210.10 > 10.188.212.52: icmp: echo request (DF)
> > 10.188.211.51 > 10.188.210.10: icmp: \
> >     10.188.212.52 unreachable - need to frag (mtu 1300)
>
> Here again we have received a packet-too-big message, but it wasn't generated
> by the PF box but rather by RT (which has the 1300 MTU).  I'm interested in
> the case where the PF box is where the MTU shift occurs (due to the larger
> headers of IPv6), and so it must generate the error rather than just
> forwarding one from upstream.

    see above. the test I've done indeed covers the scenario
    where MTU is too small for IPv6 packet.

</snip>

    If too-big error will be coming from host RT and error will
    will match state created by af-to rule, then firewall will
    use IPv4 address from state (a.k.a. destination IP from packet
    sent by client). In that case client will see too-big error as
    coming from destination host. looks odd but there is nothing
    we can do about it.

>
>
>
> Independent of this, I wanted to provide some additional information about my
> environment, as it is not as simple as the test environment.  Our setup makes
> use of rdomains, which I did not include in the original ticket, but realize
> now does make for a different setup.  I'll try to define the topology:
>
> PF box has three physical interfaces in use:
>
> em0 (member of trunk0)
> em1 (member of trunk0)
> em2 (management interface)
>
> em0/em1 are bonded using trunk(4) into interface trunk0
>
> trunk0 is connected to an upstream switch with tagged VLANs enabled, so we
> create vlan(4) interfaces on top of trunk0
>
> For the purposes of this bug, we will deal with a single vlan, vlan42
>
> vlan42 has both an IPv4 and IPv6 address assigned to it.  Our intent is to
> use it as a kind of "hairpin CLAT"; IPv4 packets are received on vlan42's
> IPv4 interface, af-translated by PF, and emitted back on vlan42 as IPv6
> packets.  The default router on vlan42 (not managed by OpenBSD) will forward
> the packets to our NAT64 box and eventual delivery.
>
> To isolate VLANs from each other, each vlan interface is put into its own
> rtable (42, in this case).
>
> Under this setup, we do not see any ICMPv4 packet-too-big messages.  We have
> attempted packet captures on both vlan42 and on em2 (management), but have
> not seen any ICMP codes.  We even went so far as to add a IPv4 address and
> default route to em2 in case PF was sending them via the default rtable
> instead of the one assigned to the incoming interface, but nothing there
> either.
>
> We have a "pass out" default rule in our pf.conf, so I do not believe we are
> preventing any generated packets from leaving the box.
>
> I can provide full PF rules and network topology if necessary, but I think we
> should debug your test network case first to see if PF actually will generate
> a packet-too-big message before we move on to anything more complicated.
>

    I understand what are you trying to do. I'm afraid I will need
    output of ifconfig and pf.conf. Taking a brief look into your source
    code I can see icmp6_reflect() is aware of routing domains, so I would
    assume things should work too. 

    Also if you can add some network diagram with icmp packet which should
    generates the too-big error. The ICMP packet should come with IP addresses
    so I can better reason about pf behavior for your set up.

thanks and
regards
sashan

Re: pf af-to silently dropping packets that are too big after translation

Reply via email to