Hi Andrea,
Thank you for the detailed review. I intend to address every point
below in the rework agreed in the cover letter thread (one RFC series
per behavior, on a new lwtunnel encap type with its own SEG6_MOBILE_*
attribute namespace); answers to your questions inline.
> There are three issues with the End.M.GTP4.E UAPI behavior:
>
> (1) In the example, v4_mask_len = 32 and the src value is required but not
> used.
[...]
> (2) With v4_mask_len < 32, the low bits of the IPv4 SA that the inbound
> packet does not carry come from src.
[...]
> (3) With v4_mask_len < 32, the low bits of the IPv4 DA that the SID does
> not carry are set to zero.
[...]
> Together with the IPv4 SA behavior in (2), does it make sense to support
> v4_mask_len values other than 32?
Agreed on all three, and no, I see no practical use for v4_mask_len
values other than 32. I will drop both v4_mask_len and the src
template from End.M.GTP4.E: the next version will recover the full
32-bit IPv4 DA from the SID right after the locator, and the full
32-bit IPv4 SA from the inbound IPv6 SA at bit offset
v6_src_prefix_len (default 64), per RFC 9433 Figures 9 and 10. That
leaves no required attribute whose value can be unused or partially
discarded.
> This check cannot trigger at runtime.
> seg6_mobile_v4_validate() already rejects v6_src_prefix_len > 96 at install
> time, and the default (64) leaves p_bits at most 96.
> If kept for defense in depth, or if you plan to reuse this helper
> elsewhere, the silent return of 0 produces an outgoing IPv4 packet with SA
> = 0.0.0.0 without any error.
> Nit: the (unsigned int) cast on p_bits can be removed.
Will fix: the range will be enforced only at build_state time with an
extack message, so the data path helper cannot fail silently, and the
cast will go away.
> seg6_mobile_skb_prefix_bits() reads the matched route prefix length from
> the FIB on every packet. [...]
> Caching it in slwt->mobile_info at validate would let the data path read
> minfo->locator_bits directly.
[...]
> The operator cannot configure the IPv4 DA to start at a position different
> from where the route prefix ends. Is this intentional?
It was, but it does not survive your first point: deriving the
locator length from the matched route prefix couples the SID layout
to the routing table and costs a per-packet FIB read. I will replace
it with an explicit locator-length attribute (sr_prefix_len, validated
at build_state time), which removes the per-packet rcu/container_of
walk and the 128 fallback, and also decouples the IPv4 DA position
from the route prefix, so one route can cover SIDs whose locator is
longer than the route prefix.
> Nit: skb_push() returns void * so the explicit casts on two of the three
> calls are unnecessary.
Will fix.
> orig_dst may be NULL or stale. This is an NF_HOOK finish callback, and
> Netfilter processing during NF_INET_PRE_ROUTING can drop or replace the
> skb dst. skb->cb is not guaranteed to be untouched (IPCB/IP6CB aliases it).
As agreed in the cover letter thread, the initial per-behavior series
will drop the NF_HOOK split entirely: each behavior becomes a single
input function with no skb->cb context and no finish callback. To be
revisited on top of your seg6 netfilter fix.
> Same four-sizeof sum as ovhd above. Move ovhd to function scope to avoid
> the repetition.
Will fix; with iptunnel_handle_offloads() below, the GSO-vs-MTU check
itself goes away, leaving a single worst-case skb_cow_head().
> input_action_end_m_gtp4_e_finish() never calls iptunnel_handle_offloads(),
> so GSO of the outer UDP tunnel cannot operate correctly. The call should
> go between skb_cow_head() and seg6_mobile_push_gtpu().
[...]
> The fix needs end-to-end testing to confirm it works correctly.
Will do exactly as you describe. I will also add a GSO case to the
per-behavior selftest so the fix gets end-to-end coverage.
> The caller already reserves worst-case headroom, so the skb_cow_head()
> calls inside seg6_mobile_push_gtpu() are always no-ops. Could they be
> removed?
Yes, will remove them and make the helper non-failing; the caller
reserves the worst case once.
> Does this mirror the drivers/net/gtp.c data-path convention
> (src == dst == 2152)?
Yes -- drivers/net/gtp.c transmits from its bound socket port, so its
packets also leave with src == dst == 2152.
> The comment says the packet "traverses NF_INET_LOCAL_OUT", but dst_output()
> goes through NF_INET_POST_ROUTING only. [...]
> IMHO, gateways that do protocol transformation can set rp_filter=0. Using
> ip_route_input() + dst_input() and documenting the rp_filter requirement is
> preferable to silently bypassing the FORWARD chain.
Agreed; will switch to ip_route_input() + dst_input() so the rebuilt
IPv4 packet traverses NF_INET_PRE_ROUTING and NF_INET_FORWARD like
the other behaviors, and will document the rp_filter=0 requirement on
the ingress device (code comment and the iproute2 man page).
> The anonymous { } scope block should be avoided. rt and fl4 should be
> declared at the function top. The same pattern appears in other patches
> of this series.
[...]
> Variable declarations are not in reverse Christmas tree order. Same issue
> in the other functions introduced by this patch.
Will fix both across the series, splitting the egress half into its
own helper with all declarations at function top.
> This pskb_may_pull pulls the OUTER IPv6 header, not the inner T-PDU.
> BAD_INNER is the wrong drop reason here. [...]
> Same BAD_INNER misuse: ipv6_skip_exthdr() is parsing the OUTER IPv6
> extension headers [...]
Right. Per the cover letter thread the MUP-specific drop reasons will
be left out of the initial per-behavior series; the behaviors will
adopt the SRv6-level reasons from your prep series once it lands, and
BAD_SID / BAD_GTPU will return with correct scoping at that point.
> frag_off is not checked after ipv6_skip_exthdr(). [...] Adding
> "if (frag_off) goto drop;" after the ipv6_skip_exthdr() call would handle
> this.
Will add the check, plus a fragmented-input case to the selftests as
discussed in the cover letter thread.
> Same seg6_mobile_get_validated_srh() bug described in my reply to patch 1.
Will fix as discussed there: the helper will return a three-state
result (absent / present / malformed) so an absent SRH is
distinguishable from a malformed one, with HMAC validated whenever an
SRH is present.
> RFC 6040 Section 1.1 scopes the document to ECN field processing in
> IP-in-IP tunnels. The patch cites it for DSCP and Hop Limit/TTL on a
> protocol conversion (the IPv6 outer is popped, not encapsulated). Could
> you clarify the RFC 6040 reference?
You are right, the citation was over-scoped. I will drop the
reference and describe the DSCP/ECN and Hop Limit to TTL propagation
in plain words.
> The inner_nfproto-based size selection appears several times:
> pskb_may_pull, skb->protocol, and skb_set_transport_header. Computing a
> local inner_hdr_len once inside the switch would replace all three. Same
> pattern in every other behavior.
Will fix in every behavior.
> parse_nla_mobile_pdu_type() accepts the full 4-bit range 0..15, but the
> function comment in seg6_mobile_push_gtpu() notes that only 0 (downlink)
> and 1 (uplink) have a defined meaning.
> RFC 9433 describes the E behaviors in the downlink packet flows (Section
> 5.3.1.2 and 5.3.2.2). How are PDU Type 1 (uplink) and the reserved values
> supposed to be used with the GTP*.E behaviors?
> UAPI cannot be tightened after merge. IMHO, if a type is not supported
> yet, the parser should reject it and notify userspace.
Agreed. RFC 9433 describes End.M.GTP4.E only in the downlink toward
the gNB (Sections 6.6 and 5.3.2.2), so the End.M.GTP4.E series will
accept only PDU Type 0 (DL) and reject everything else with an
extack message. PDU Type 1 (UL) belongs to End.M.GTP6.E, which
regenerates uplink GTP-U in the drop-in mode (Section 5.4: "There is
one instance of the End.M.GTP6.E SID per PDU type"), so it will be
enabled by that series; the reserved values 2..15 stay rejected
there too.
Thanks,
Yuya