jball-resetdata opened a new issue, #13171:
URL: https://github.com/apache/cloudstack/issues/13171

   # IPv6 BGP-routed Isolated network: missing `ct state established,related` 
INPUT rule on VR's IPv6 firewall
   
   ## Summary
   
   When creating a tenant network using an IPv6-only ROUTED + Filtered offering 
(`internetprotocol=ipv6`, `networkmode=ROUTED`, services including Firewall), 
the Virtual Router's nftables `ip6 ip6_firewall fw_input` chain has `policy 
drop` and only ICMPv6 accept rules. There is **no `ct state established,related 
accept` rule** on the public NIC.
   
   Because the VR initiates BGP outbound to upstream PE peers, the **return 
SYN-ACK is silently dropped at the v6 INPUT hook**, before TCP's MD5 
verification ever runs. BGP IPv6 sessions cannot reach `Established`.
   
   The equivalent IPv4 INPUT chain on the same VR DOES have `iifname "eth2" ct 
state related,established counter accept`, and IPv4 BGP works correctly.
   
   ## Environment
   
   - **Apache CloudStack 4.22.0.0** (live install on staging mgmt host)
   - Source analysis cross-checked against `4.20` branch HEAD `a7c2a05` — same 
bug visible in source on both branches
   - Hypervisor: KVM on Ubuntu 24.04
   - Hosts: 2-node staging cluster
   - VR systemvm template: ACS 4.20 stock
   - FRR on VR: `8.4.4`
   - Network offering: `IsolatedV6RoutedFiltered` (`internetprotocol=ipv6`, 
`routingmode=Dynamic`, `networkmode=ROUTED`, services `[UserData, Firewall, 
Dhcp, Dns]`, `egressdefaultpolicy=true`)
   - BGP peer ASN: `140646` (external)
   - ACS ASN range: `4200000001-4200000099` (32-bit private)
   - IPv6 guest prefix: `/48`
   - Reproduced on **two independent VRs** (`r-276-VM` ASN 4200000052, 
`r-278-VM` ASN 4200000081) — identical symptom, identical fix.
   
   ## Steps to reproduce
   
   1. Configure zone with IPv6 BGP routing: ASN range, BGP peers (dual-stack), 
IPv6 guest prefix `/48`.
   2. Create a network offering matching the above shape, then enable it.
   3. `createNetwork` using the offering.
   4. Deploy a VM into the network — VR is provisioned.
   5. SSH into the VR via its link-local IP (`port 3922`, systemvm key from 
`/root/.ssh/id_rsa.cloud`).
   6. Check BGP state.
   
   ## Expected
   
   ```
   $ vtysh -c "show bgp ipv6 unicast summary"
   Neighbor                         State/PfxRcd
   2400:88e0:ffff:258::2  Established     1
   2400:88e0:ffff:258::3  Established     1
   ```
   
   VR advertises tenant `/64` upstream; VMs in the network are reachable from 
the IPv6 internet.
   
   ## Actual
   
   ```
   $ vtysh -c "show bgp ipv6 unicast summary"
   Neighbor                         State/PfxRcd
   2400:88e0:ffff:258::2  Connect          0
   2400:88e0:ffff:258::3  Connect          0
   ```
   
   The IPv4 sessions on the SAME VR work normally:
   ```
   10.25.12.2  Established  PfxRcd=1
   10.25.12.3  Established  PfxRcd=1
   ```
   
   ## Diagnostic
   
   Packet capture on the hypervisor's underlay (`bond0`, VLAN 258):
   
   ```
   VR → PE: TCP SYN (port 179) with MD5
   PE → VR: TCP SYN-ACK with MD5
   VR → PE: TCP SYN retransmit (VR never sent ACK)
   PE → VR: TCP SYN-ACK retransmit
   ... cycle repeats until VR's connect timeout ...
   ```
   
   PE responds correctly. Return packet reaches the VR's `eth2`. But VR's 
nftables drops it before TCP processes it.
   
   Inside the VR, the v6 firewall table:
   
   ```
   $ nft list table ip6 ip6_firewall
   table ip6 ip6_firewall {
       chain fw_input {
           type filter hook input priority filter; policy drop;
           icmpv6 type { echo-request, echo-reply, nd-router-advert,
                          nd-neighbor-solicit, nd-neighbor-advert } accept
       }
       chain fw_forward {
           type filter hook forward priority filter; policy accept;
           ct state established,related accept
           ip6 saddr <tenant-/64> jump fw_chain_egress
           ip6 daddr <tenant-/64> jump fw_chain_ingress
       }
       chain fw_chain_egress { counter accept }
       chain fw_chain_ingress {
           # tenant-configured ingress rules
           ip6 saddr ::/0 ip6 daddr ::/0 icmpv6 type { ... } accept
           ip6 saddr ::/0 ip6 daddr ::/0 tcp dport 22 accept
           counter drop
       }
   }
   ```
   
   For comparison, the IPv4 table on the same VR:
   
   ```
   $ nft list table ip ip4_firewall
   table ip ip4_firewall {
       chain INPUT {
           type filter hook input priority filter; policy drop;
           ...
           iifname "eth2" ct state established,related counter packets ... 
accept
           ...
       }
       ...
   }
   ```
   
   The IPv4 INPUT chain has the rule on `eth2`; the IPv6 `fw_input` chain does 
not.
   
   Kernel TCPMD5 counters are all zero, confirming the drop happens before TCP 
state machine — i.e., at netfilter.
   
   ## Source code root cause
   
   In `systemvm/debian/opt/cloud/bin/cs/CsAddress.py`, `fw_router_routing()` 
writes the default INPUT and FORWARD rules for **IPv4 only**:
   
   ```python
   def fw_router_routing(self):
       if self.config.is_vpc() or not self.config.is_routed():
           return
   
       # Add default rules for INPUT chain
       self.nft_ipv4_fw.append({'type': "", 'chain': 'INPUT',
                                'rule': "iifname lo counter accept"})
       self.nft_ipv4_fw.append({'type': "", 'chain': 'INPUT',
                                'rule': "iifname eth2 ct state 
related,established counter accept"})  # <-- this rule
       # Add default rules for FORWARD chain
       self.nft_ipv4_fw.append({'type': "", 'chain': 'FORWARD',
                  'rule': 'iifname "eth2" oifname "eth0" ct state 
related,established counter accept'})
       # ... more v4-only rules ...
   ```
   
   There is **no IPv6 equivalent** of this function — `nft_ipv6_fw` is not 
appended-to anywhere. The IPv6 firewall's INPUT chain default rules are 
entirely missing for ROUTED-mode Isolated networks.
   
   `CsNetfilter.py:add_ip6_chain()` adds the `ct state established,related 
accept` rule **only** to FORWARD-hooked chains, not INPUT:
   
   ```python
   def add_ip6_chain(self, address_family, table, chain, hook, action):
       ...
       if hook == "input" or hook == "output":
           CsHelper.execute("nft add rule %s %s %s icmpv6 type { ... } accept" 
% ...)
       elif hook == "forward":
           CsHelper.execute("nft add rule %s %s %s ct state established,related 
accept" % ...)
   ```
   
   So for v6 INPUT (`fw_input` chain), only ICMPv6 is allowed and the chain 
inherits `policy drop`. The return BGP traffic never matches anything → dropped.
   
   ## Reproduction confirmed across multiple VRs
   
   Tested independently on two fresh VRs in two different tenant networks. Both 
showed:
   - IPv4 BGP works (Established)
   - IPv6 BGP stuck at Connect (PfxRcd=0)
   - Same fw_input chain layout with same missing rule
   - Same fix applies
   
   ## Workaround
   
   On the running VR, apply the missing rule and restart FRR:
   
   ```bash
   nft 'add rule ip6 ip6_firewall fw_input iifname "eth2" ct state 
established,related counter accept'
   systemctl restart frr
   ```
   
   Within seconds, both IPv6 BGP sessions reach `Established`, tenant /64 is 
advertised, VMs become reachable from IPv6 internet. Verified end-to-end with 
SSH from public IPv6 internet to VM inside the v6-only routed network.
   
   **Caveat**: the workaround is in-memory only. Lost on:
   - VR reboot
   - Any subsequent `cmk createIpv6FirewallRule` / `cmk deleteIpv6FirewallRule` 
call (ACS regenerates the chain from its own config DB, wiping the 
manually-added rule)
   - Any other event that triggers a v6 firewall reconfiguration on the VR
   
   Each tenant FW rule change wipes the workaround. The operator has to SSH 
back into the VR and re-apply the nft rule after every FW change. This makes 
the offering effectively unusable as a customer product without the upstream 
fix.
   
   ## Proposed fix — VALIDATED on a live VR
   
   Add a v6 equivalent of `fw_router_routing()` in 
`systemvm/debian/opt/cloud/bin/cs/CsAddress.py` plus expose `nft_ipv6_fw` on 
`CsIP`. `nft_ipv6_fw` already exists on `CsConfig` (line 43); we just need to 
plumb it through CsIP and write into it.
   
   Three changes in `CsAddress.py`:
   
   **1. Add reference in `CsIP.__init__` (around line 312):**
   ```diff
            self.nft_ipv4_fw = config.get_nft_ipv4_fw()
            self.nft_ipv4_acl = config.get_nft_ipv4_acl()
   +        self.nft_ipv6_fw = config.get_ipv6_fw()
   ```
   
   **2. Add new `fw_router_routing_v6()` method (immediately before 
`fw_vpcrouter_routing` at line 674):**
   ```python
   def fw_router_routing_v6(self):
       if self.config.is_vpc() or not self.config.is_routed():
           return
       # IPv6 INPUT chain defaults — mirror of fw_router_routing() for v4.
       # Without these, return traffic for VR-initiated v6 connections (BGP 
etc) 
       # is silently dropped by the default-DROP policy.
       self.nft_ipv6_fw.append({'type': "", 'chain': 'fw_input',
                                'rule': "iifname lo counter accept"})
       self.nft_ipv6_fw.append({'type': "", 'chain': 'fw_input',
                                'rule': "iifname eth2 ct state 
established,related counter accept"})
       if self.get_type() in ["guest"]:
           self.nft_ipv6_fw.append({'type': "", 'chain': 'fw_input',
                                    'rule': "iifname %s ct state 
established,related counter accept" % self.dev})
   ```
   
   **3. Call it from `CsIP.configure()` (line 756-757):**
   ```diff
            self.fw_router_routing()
            self.fw_vpcrouter_routing()
   +        self.fw_router_routing_v6()
   ```
   
   Note: `eth2` is hardcoded matching the v4 convention (and 
`PUBLIC_INTERFACES["router"]` in `CsHelper.py`). A more robust fix could 
reference that constant.
   
   ### Validation
   
   Applied this patch in-place on a running VR (`r-278-VM`, ACS 4.22.0.0) on 
2026-05-16:
   
   1. Pre-patch: v6 BGP stuck in Connect; v6 fw_input chain had only ICMPv6 
accept
   2. Patch applied; `/opt/cloud/bin/configure.py cmd_line.json` triggered 
re-process
   3. fw_input chain now includes `iifname "eth2" ct state established,related 
counter accept`
   4. v6 BGP sessions Established within seconds, PfxRcd=1, PfxSnt=2
   
   **Survival test (the key one)**: After patch, ran `cmk 
createIpv6FirewallRule networkid=<net> traffictype=Ingress protocol=tcp 
startport=80 endport=80` — this pushes `ipv6_firewall_rules.json` to the VR and 
triggers the full IpTablesExecutor flush+rebuild path that previously wiped the 
manual nft workaround. **After the FW change:**
   - `iifname "eth2" ct state established,related accept` rule **persists** in 
fw_input (with active counters)
   - Both v6 BGP sessions **still Established**
   - End-to-end SSH from public IPv6 internet to VM in the network **still 
works**
   
   This confirms the fix is correct and durable. The bug is in CsAddress.py / 
`nft_ipv6_fw` not being populated; the rest of the pipeline handles the v6 list 
correctly once it has content.
   
   ### VPC equivalent
   
   The same gap likely exists in the VPC routed path (`fw_vpcrouter_routing` at 
line 674). Not tested here (our setup is non-VPC Isolated) but worth a 
symmetric audit.
   
   ## Affected versions
   
   **Verified on Apache CloudStack 4.22.0.0** (latest LTS at time of filing). 
PR #10970, which added the equivalent FORWARD-chain rule, is present and active 
in this build — but the INPUT-chain rule was deliberately removed in the PR's 
second commit ("Remove rule from input chain"), leaving this regression.
   
   Affected versions (by code inspection + PR #10970 history):
   - 4.20.2, 4.20.3, 4.21.x, 4.22.0.0, 4.22.0.1 — all affected
   - 4.20.0, 4.20.1 — also affected, but for a different reason (PR #10970 
itself wasn't yet merged, so both FORWARD and INPUT chains were missing the 
rule)
   
   ## Severity
   
   **High** for anyone wanting to deploy IPv6-only ROUTED Isolated networks at 
scale. The feature appears to work (offering enables, network creates, VR 
provisions, BGP-v4 establishes) but tenant v6 traffic doesn't route because 
BGP-v6 silently fails. Diagnosis requires packet captures on the underlay — not 
obvious from the VR's own view.
   
   ## Related
   
   - **PR #10970** ("IPv6 firewall: accept packets from related and established 
connections") — landed in 4.20.2 and 4.22.0.0 — added the equivalent rule to 
the **FORWARD** chain only. This fixed the VM-return-traffic case (downloads, 
etc.) but did NOT add the rule to the **INPUT** chain, leaving the VR's own 
outbound BGP return traffic still dropped. The PR discussion mentions a second 
commit "Remove rule from input chain" — suggesting an earlier draft did add the 
INPUT rule but it was removed in review. The bug described here is the 
consequence of that removal: VR-originated v6 connections (BGP, but also NTP, 
DNS lookups, etc., that the systemvm itself initiates outbound) fail on the 
return.
   - `IsolatedV6RoutedFiltered` offering — affected
   - `IsolatedV6RoutedOffering` (no Firewall service) — not affected (no 
firewall service means no `ip6_firewall` table; v6 BGP works there because no 
nftables drop happens)
   - IPv4 ROUTED with same offering shape — works as expected (different code 
path: `fw_router_routing()` in `CsAddress.py` writes the INPUT `iifname "eth2" 
ct state related,established` rule for v4)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to