jball-resetdata opened a new issue, #13171:
URL: https://github.com/apache/cloudstack/issues/13171
# IPv6 BGP-routed Isolated network: missing `ct state established,related`
INPUT rule on VR's IPv6 firewall
## Summary
When creating a tenant network using an IPv6-only ROUTED + Filtered offering
(`internetprotocol=ipv6`, `networkmode=ROUTED`, services including Firewall),
the Virtual Router's nftables `ip6 ip6_firewall fw_input` chain has `policy
drop` and only ICMPv6 accept rules. There is **no `ct state established,related
accept` rule** on the public NIC.
Because the VR initiates BGP outbound to upstream PE peers, the **return
SYN-ACK is silently dropped at the v6 INPUT hook**, before TCP's MD5
verification ever runs. BGP IPv6 sessions cannot reach `Established`.
The equivalent IPv4 INPUT chain on the same VR DOES have `iifname "eth2" ct
state related,established counter accept`, and IPv4 BGP works correctly.
## Environment
- **Apache CloudStack 4.22.0.0** (live install on staging mgmt host)
- Source analysis cross-checked against `4.20` branch HEAD `a7c2a05` — same
bug visible in source on both branches
- Hypervisor: KVM on Ubuntu 24.04
- Hosts: 2-node staging cluster
- VR systemvm template: ACS 4.20 stock
- FRR on VR: `8.4.4`
- Network offering: `IsolatedV6RoutedFiltered` (`internetprotocol=ipv6`,
`routingmode=Dynamic`, `networkmode=ROUTED`, services `[UserData, Firewall,
Dhcp, Dns]`, `egressdefaultpolicy=true`)
- BGP peer ASN: `140646` (external)
- ACS ASN range: `4200000001-4200000099` (32-bit private)
- IPv6 guest prefix: `/48`
- Reproduced on **two independent VRs** (`r-276-VM` ASN 4200000052,
`r-278-VM` ASN 4200000081) — identical symptom, identical fix.
## Steps to reproduce
1. Configure zone with IPv6 BGP routing: ASN range, BGP peers (dual-stack),
IPv6 guest prefix `/48`.
2. Create a network offering matching the above shape, then enable it.
3. `createNetwork` using the offering.
4. Deploy a VM into the network — VR is provisioned.
5. SSH into the VR via its link-local IP (`port 3922`, systemvm key from
`/root/.ssh/id_rsa.cloud`).
6. Check BGP state.
## Expected
```
$ vtysh -c "show bgp ipv6 unicast summary"
Neighbor State/PfxRcd
2400:88e0:ffff:258::2 Established 1
2400:88e0:ffff:258::3 Established 1
```
VR advertises tenant `/64` upstream; VMs in the network are reachable from
the IPv6 internet.
## Actual
```
$ vtysh -c "show bgp ipv6 unicast summary"
Neighbor State/PfxRcd
2400:88e0:ffff:258::2 Connect 0
2400:88e0:ffff:258::3 Connect 0
```
The IPv4 sessions on the SAME VR work normally:
```
10.25.12.2 Established PfxRcd=1
10.25.12.3 Established PfxRcd=1
```
## Diagnostic
Packet capture on the hypervisor's underlay (`bond0`, VLAN 258):
```
VR → PE: TCP SYN (port 179) with MD5
PE → VR: TCP SYN-ACK with MD5
VR → PE: TCP SYN retransmit (VR never sent ACK)
PE → VR: TCP SYN-ACK retransmit
... cycle repeats until VR's connect timeout ...
```
PE responds correctly. Return packet reaches the VR's `eth2`. But VR's
nftables drops it before TCP processes it.
Inside the VR, the v6 firewall table:
```
$ nft list table ip6 ip6_firewall
table ip6 ip6_firewall {
chain fw_input {
type filter hook input priority filter; policy drop;
icmpv6 type { echo-request, echo-reply, nd-router-advert,
nd-neighbor-solicit, nd-neighbor-advert } accept
}
chain fw_forward {
type filter hook forward priority filter; policy accept;
ct state established,related accept
ip6 saddr <tenant-/64> jump fw_chain_egress
ip6 daddr <tenant-/64> jump fw_chain_ingress
}
chain fw_chain_egress { counter accept }
chain fw_chain_ingress {
# tenant-configured ingress rules
ip6 saddr ::/0 ip6 daddr ::/0 icmpv6 type { ... } accept
ip6 saddr ::/0 ip6 daddr ::/0 tcp dport 22 accept
counter drop
}
}
```
For comparison, the IPv4 table on the same VR:
```
$ nft list table ip ip4_firewall
table ip ip4_firewall {
chain INPUT {
type filter hook input priority filter; policy drop;
...
iifname "eth2" ct state established,related counter packets ...
accept
...
}
...
}
```
The IPv4 INPUT chain has the rule on `eth2`; the IPv6 `fw_input` chain does
not.
Kernel TCPMD5 counters are all zero, confirming the drop happens before TCP
state machine — i.e., at netfilter.
## Source code root cause
In `systemvm/debian/opt/cloud/bin/cs/CsAddress.py`, `fw_router_routing()`
writes the default INPUT and FORWARD rules for **IPv4 only**:
```python
def fw_router_routing(self):
if self.config.is_vpc() or not self.config.is_routed():
return
# Add default rules for INPUT chain
self.nft_ipv4_fw.append({'type': "", 'chain': 'INPUT',
'rule': "iifname lo counter accept"})
self.nft_ipv4_fw.append({'type': "", 'chain': 'INPUT',
'rule': "iifname eth2 ct state
related,established counter accept"}) # <-- this rule
# Add default rules for FORWARD chain
self.nft_ipv4_fw.append({'type': "", 'chain': 'FORWARD',
'rule': 'iifname "eth2" oifname "eth0" ct state
related,established counter accept'})
# ... more v4-only rules ...
```
There is **no IPv6 equivalent** of this function — `nft_ipv6_fw` is not
appended-to anywhere. The IPv6 firewall's INPUT chain default rules are
entirely missing for ROUTED-mode Isolated networks.
`CsNetfilter.py:add_ip6_chain()` adds the `ct state established,related
accept` rule **only** to FORWARD-hooked chains, not INPUT:
```python
def add_ip6_chain(self, address_family, table, chain, hook, action):
...
if hook == "input" or hook == "output":
CsHelper.execute("nft add rule %s %s %s icmpv6 type { ... } accept"
% ...)
elif hook == "forward":
CsHelper.execute("nft add rule %s %s %s ct state established,related
accept" % ...)
```
So for v6 INPUT (`fw_input` chain), only ICMPv6 is allowed and the chain
inherits `policy drop`. The return BGP traffic never matches anything → dropped.
## Reproduction confirmed across multiple VRs
Tested independently on two fresh VRs in two different tenant networks. Both
showed:
- IPv4 BGP works (Established)
- IPv6 BGP stuck at Connect (PfxRcd=0)
- Same fw_input chain layout with same missing rule
- Same fix applies
## Workaround
On the running VR, apply the missing rule and restart FRR:
```bash
nft 'add rule ip6 ip6_firewall fw_input iifname "eth2" ct state
established,related counter accept'
systemctl restart frr
```
Within seconds, both IPv6 BGP sessions reach `Established`, tenant /64 is
advertised, VMs become reachable from IPv6 internet. Verified end-to-end with
SSH from public IPv6 internet to VM inside the v6-only routed network.
**Caveat**: the workaround is in-memory only. Lost on:
- VR reboot
- Any subsequent `cmk createIpv6FirewallRule` / `cmk deleteIpv6FirewallRule`
call (ACS regenerates the chain from its own config DB, wiping the
manually-added rule)
- Any other event that triggers a v6 firewall reconfiguration on the VR
Each tenant FW rule change wipes the workaround. The operator has to SSH
back into the VR and re-apply the nft rule after every FW change. This makes
the offering effectively unusable as a customer product without the upstream
fix.
## Proposed fix — VALIDATED on a live VR
Add a v6 equivalent of `fw_router_routing()` in
`systemvm/debian/opt/cloud/bin/cs/CsAddress.py` plus expose `nft_ipv6_fw` on
`CsIP`. `nft_ipv6_fw` already exists on `CsConfig` (line 43); we just need to
plumb it through CsIP and write into it.
Three changes in `CsAddress.py`:
**1. Add reference in `CsIP.__init__` (around line 312):**
```diff
self.nft_ipv4_fw = config.get_nft_ipv4_fw()
self.nft_ipv4_acl = config.get_nft_ipv4_acl()
+ self.nft_ipv6_fw = config.get_ipv6_fw()
```
**2. Add new `fw_router_routing_v6()` method (immediately before
`fw_vpcrouter_routing` at line 674):**
```python
def fw_router_routing_v6(self):
if self.config.is_vpc() or not self.config.is_routed():
return
# IPv6 INPUT chain defaults — mirror of fw_router_routing() for v4.
# Without these, return traffic for VR-initiated v6 connections (BGP
etc)
# is silently dropped by the default-DROP policy.
self.nft_ipv6_fw.append({'type': "", 'chain': 'fw_input',
'rule': "iifname lo counter accept"})
self.nft_ipv6_fw.append({'type': "", 'chain': 'fw_input',
'rule': "iifname eth2 ct state
established,related counter accept"})
if self.get_type() in ["guest"]:
self.nft_ipv6_fw.append({'type': "", 'chain': 'fw_input',
'rule': "iifname %s ct state
established,related counter accept" % self.dev})
```
**3. Call it from `CsIP.configure()` (line 756-757):**
```diff
self.fw_router_routing()
self.fw_vpcrouter_routing()
+ self.fw_router_routing_v6()
```
Note: `eth2` is hardcoded matching the v4 convention (and
`PUBLIC_INTERFACES["router"]` in `CsHelper.py`). A more robust fix could
reference that constant.
### Validation
Applied this patch in-place on a running VR (`r-278-VM`, ACS 4.22.0.0) on
2026-05-16:
1. Pre-patch: v6 BGP stuck in Connect; v6 fw_input chain had only ICMPv6
accept
2. Patch applied; `/opt/cloud/bin/configure.py cmd_line.json` triggered
re-process
3. fw_input chain now includes `iifname "eth2" ct state established,related
counter accept`
4. v6 BGP sessions Established within seconds, PfxRcd=1, PfxSnt=2
**Survival test (the key one)**: After patch, ran `cmk
createIpv6FirewallRule networkid=<net> traffictype=Ingress protocol=tcp
startport=80 endport=80` — this pushes `ipv6_firewall_rules.json` to the VR and
triggers the full IpTablesExecutor flush+rebuild path that previously wiped the
manual nft workaround. **After the FW change:**
- `iifname "eth2" ct state established,related accept` rule **persists** in
fw_input (with active counters)
- Both v6 BGP sessions **still Established**
- End-to-end SSH from public IPv6 internet to VM in the network **still
works**
This confirms the fix is correct and durable. The bug is in CsAddress.py /
`nft_ipv6_fw` not being populated; the rest of the pipeline handles the v6 list
correctly once it has content.
### VPC equivalent
The same gap likely exists in the VPC routed path (`fw_vpcrouter_routing` at
line 674). Not tested here (our setup is non-VPC Isolated) but worth a
symmetric audit.
## Affected versions
**Verified on Apache CloudStack 4.22.0.0** (latest LTS at time of filing).
PR #10970, which added the equivalent FORWARD-chain rule, is present and active
in this build — but the INPUT-chain rule was deliberately removed in the PR's
second commit ("Remove rule from input chain"), leaving this regression.
Affected versions (by code inspection + PR #10970 history):
- 4.20.2, 4.20.3, 4.21.x, 4.22.0.0, 4.22.0.1 — all affected
- 4.20.0, 4.20.1 — also affected, but for a different reason (PR #10970
itself wasn't yet merged, so both FORWARD and INPUT chains were missing the
rule)
## Severity
**High** for anyone wanting to deploy IPv6-only ROUTED Isolated networks at
scale. The feature appears to work (offering enables, network creates, VR
provisions, BGP-v4 establishes) but tenant v6 traffic doesn't route because
BGP-v6 silently fails. Diagnosis requires packet captures on the underlay — not
obvious from the VR's own view.
## Related
- **PR #10970** ("IPv6 firewall: accept packets from related and established
connections") — landed in 4.20.2 and 4.22.0.0 — added the equivalent rule to
the **FORWARD** chain only. This fixed the VM-return-traffic case (downloads,
etc.) but did NOT add the rule to the **INPUT** chain, leaving the VR's own
outbound BGP return traffic still dropped. The PR discussion mentions a second
commit "Remove rule from input chain" — suggesting an earlier draft did add the
INPUT rule but it was removed in review. The bug described here is the
consequence of that removal: VR-originated v6 connections (BGP, but also NTP,
DNS lookups, etc., that the systemvm itself initiates outbound) fail on the
return.
- `IsolatedV6RoutedFiltered` offering — affected
- `IsolatedV6RoutedOffering` (no Firewall service) — not affected (no
firewall service means no `ip6_firewall` table; v6 BGP works there because no
nftables drop happens)
- IPv4 ROUTED with same offering shape — works as expected (different code
path: `fw_router_routing()` in `CsAddress.py` writes the INPUT `iifname "eth2"
ct state related,established` rule for v4)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]