First, roll out IPv6 if you haven't yet. That should relieve a lot of pressure 
on your pool size, and gives customers a workaround for some of the weird 
things ("Use the IPv6 address instead of IPv4.").

Second, build your own geofeed. You can create a CSV providing as much detail 
as you want, down to "This individual address is at this long/lat" if you want. 
Then publish the location of that file in whois.
Short pointer: https://mailman.nanog.org/pipermail/nanog/2022-April/219080.html

After you've rolled out IPv6 you can consider 464xlat or MAP-T. Both work well, 
but both require support from the CPE. I've heard of a custom implementation 
that kicks a customer off the CGN/xlat/BR if it detects uPNP (i.e., a customer 
that needs port forwarding). It requires reprovisioning the CPE and a reboot, 
but two minutes of downtime probably prevents a support call.

Lee Howard
IPv4.Global


-----Original Message-----
From: NANOG <nanog-bounces+leehoward=hilcostreambank....@nanog.org> On Behalf 
Of Jon Lewis
Sent: Tuesday, October 8, 2024 3:19 PM
To: nanog@nanog.org
Subject: CGNAT growing pains

[You don't often get email from jle...@lewis.org. Learn why this is important 
at https://aka.ms/LearnAboutSenderIdentification ]

This message is from an EXTERNAL SENDER - be CAUTIOUS, particularly with links 
and attachments.



We started rolling out CGNAT about 6 months ago.  It was smooth sailing for the 
first few months, but we eventually did run into a number of issues.

Our customer base is primarily FTTH with "dynamic" IP assignment via DHCP.
Since connections are always-on, customer ONTs/routers get an IP assigned, and 
then when the lease is renewed, they request a new lease for the existing IP, 
and, in general, that request is granted.  This gives customers the mistaken 
impression they have a static IP.  So, my impression, from working with some 
customers who've needed to be moved from CGNAT back to public IP is that 
customers who are doing port-forwarding don't even bother with dynamic DNS.  
They just know they can connect to their IP as they've never seen it change.  
We do offer/sell static IP, but pre-CGNAT, it was strictly for business 
customers.  i.e.
A residential customer could only get static IP service by converting their 
account to a business account. That may change in the near future.

One issue we didn't foresee has been IP Geo issues.  i.e.  We all knew that 
streaming services like Netflix use IP Geo to determine what content should be 
made available, but that's, AFAIK, limited by country or region.
What we didn't anticipate is services like Hulu Live TV doing IP Geo down to 
the city level to determine which local channels are a subscriber's local 
channels.  We're using Juniper MX gear and SPC3 cards for our CGNAT routers, 
each one having a single large external pool.  Since we serve most of FL, one 
external pool can't IP Geo correctly for customers as far apart as Miami and 
Jacksonville hitting the same CGNAT router.  We don't currently have an 
acceptable solution to this other than moving impacted customers off CGNAT.

One of the great unknowns (at least for us) with CGNAT was what our PBA 
settings should be.  i.e.  How large each port-block should be, and how many 
port-blocks to allow per customer.  We started with 256x4.  It seemed to work.  
We eventually noticed that we were logging port-block exceeded errors.  This is 
one aspect where Juniper's CGNAT support is lacking.
There's a counter for these errors, and it's available via SNMP, but there's no 
way to attribute the errors to subscriber IPs.  We're polling the mib and 
graphing it, so we know it's a continuing issue and can see when it's 
incrementing faster/slower, but Junos provides no means for determining if 
"PBEs" are all being caused by a single customer, a handful of customers, etc.  
We have a JTAC case open on this.  As a quick & hopeful fix, we both increased 
the port-block size and block limit.  That helped, but didn't stop the errors.  
It also cut our CGNAT ratio by more than half (64:1 -> 28:1), if we stay at 
this ratio, we'll need much larger external pools than originally anticipated.  
Tuning these settings is kind of painful as JTAC strongly recommends bouncing 
the CGNAT service anytime CGNAT related config changes are made.  This means 
briefly breaking Internet access for all CGNAT'd customers.  For the PBEs, 
JTAC's suggestions so far have been to shorten some of the timeouts in the 
config and to keep doing what we're doing, which is a cron job that essentially 
does a "show services nat source port-block", parses the output looking for 
subscriber IPs that have used up the ports in several of their port-blocks, 
then does a "show services sessions source-prefix ..." and logs all of this.  
This at least gives us snapshots of "who's a heavy user right now" and lets us 
look at how they were using all their ports.  i.e.
was it bittorent, are they compromised and scanning the internet for more 
systems to compromise, is it legit looking traffic - just lots of it, etc.?

The latest CGNAT issue is a customer with a Palo Alto Networks firewall 
connected to our network and several of their employees are our FTTH customers. 
 On their PANW firewall, they're doing IP Geo based filtering, limiting access 
to internal servers to "US IPs".  Since we only CGNAT traffic to the external 
Internet, their on-net employees hit the firewall from their 100.64/10 IPs and 
get blocked.  I suggested they whitelist 100.64/10, saying we block traffic 
from 100.64/10 from entering our network via peering and transit, so they can 
be assured anything from
100.64/10 came from inside our network / our customers.  They say the firewall 
won't let them whitelist 100.64.0.0/10, giving an error that it's invalid IP 
space.

I know we're not the first to implement CGNAT, so I'm curious if others have 
run into these sorts of issues, or others we haven't run into yet, and if so, 
how you solved them.


----------------------------------------------------------------------
  Jon Lewis, MCP :)              |  I route
  Blue Stream Fiber, Sr. Neteng  |  therefore you are _________ 
http://www.lewis.org/~jlewis/pgp for PGP public key_________

Reply via email to