Re: CGNAT growing pains

Aaron Gould Tue, 08 Oct 2024 13:29:59 -0700

We have ~60,000 subs on ftth, dsl and cable modem, behind severalJuniper MX routers.... MX960's with MS-MPC-128G (ftth and cm) and MX104with MS-MIC-16G (dsl) and doing well. We a had some growing pains,issues, but were resolved with, app, eim, eif, and source ip loadbalancing on AMS interface.... also, since all my subs are in L3VPN's, Ihad to share inet.0 metric with inet.3 to get mp-ibgp to see other mx'sas least cost route to accomplish nice load balancing. we did about3000 ports per sub, so like 100 port blocks at max of 30 (100*30=3000). we usually do like a /24 or /23 at each MX960, and i recall /25 at thedsl MX104's. I've senn actually high point max usage of a MS-MPC-128Gflat line during peak time at approx 65gbps... and even more recently irecall seeing about 70gbs. That's on a single MS-MPC-128G. I hope Idon't have to upgrade to SPC....(or dual ms-mps-128g) I'd rather dodual stack ipv6 and bypass the cgnat boundary. that's what my currentfocus is.


-Aaron


On 10/8/2024 2:19 PM, Jon Lewis wrote:

We started rolling out CGNAT about 6 months ago. It was smoothsailing for the first few months, but we eventually did run into anumber of issues.
Our customer base is primarily FTTH with "dynamic" IP assignment viaDHCP. Since connections are always-on, customer ONTs/routers get an IPassigned, and then when the lease is renewed, they request a new leasefor the existing IP, and, in general, that request is granted. Thisgives customers the mistaken impression they have a static IP. So, myimpression, from working with some customers who've needed to be movedfrom CGNAT back to public IP is that customers who are doingport-forwarding don't even bother with dynamic DNS. They just knowthey can connect to their IP as they've never seen it change. We dooffer/sell static IP, but pre-CGNAT, it was strictly for businesscustomers. i.e. A residential customer could only get static IPservice by converting their account to a business account. That maychange in the near future.
One issue we didn't foresee has been IP Geo issues. i.e. We all knewthat streaming services like Netflix use IP Geo to determine whatcontent should be made available, but that's, AFAIK, limited bycountry or region. What we didn't anticipate is services like HuluLive TV doing IP Geo down to the city level to determine which localchannels are a subscriber's local channels. We're using Juniper MXgear and SPC3 cards for our CGNAT routers, each one having a singlelarge external pool. Since we serve most of FL, one external poolcan't IP Geo correctly for customers as far apart as Miami andJacksonville hitting the same CGNAT router. We don't currently havean acceptable solution to this other than moving impacted customersoff CGNAT.
One of the great unknowns (at least for us) with CGNAT was what ourPBA settings should be. i.e. How large each port-block should be,and how many port-blocks to allow per customer. We started with256x4. It seemed to work. We eventually noticed that we were loggingport-block exceeded errors. This is one aspect where Juniper's CGNATsupport is lacking. There's a counter for these errors, and it'savailable via SNMP, but there's no way to attribute the errors tosubscriber IPs. We're polling the mib and graphing it, so we knowit's a continuing issue and can see when it's incrementingfaster/slower, but Junos provides no means for determining if "PBEs"are all being caused by a single customer, a handful of customers,etc. We have a JTAC case open on this. As a quick & hopeful fix, weboth increased the port-block size and block limit. That helped, butdidn't stop the errors. It also cut our CGNAT ratio by more than half(64:1 -> 28:1), if we stay at this ratio, we'll need much largerexternal pools than originally anticipated. Tuning these settings iskind of painful as JTAC strongly recommends bouncing the CGNAT serviceanytime CGNAT related config changes are made. This means brieflybreaking Internet access for all CGNAT'd customers. For the PBEs,JTAC's suggestions so far have been to shorten some of the timeouts inthe config and to keep doing what we're doing, which is a cron jobthat essentially does a "show services nat source port-block", parsesthe output looking for subscriber IPs that have used up the ports inseveral of their port-blocks, then does a "show services sessionssource-prefix ..." and logs all of this. This at least gives ussnapshots of "who's a heavy user right now" and lets us look at howthey were using all their ports. i.e. was it bittorent, are theycompromised and scanning the internet for more systems to compromise,is it legit looking traffic - just lots of it, etc.?
The latest CGNAT issue is a customer with a Palo Alto Networksfirewall connected to our network and several of their employees areour FTTH customers. On their PANW firewall, they're doing IP Geobased filtering, limiting access to internal servers to "US IPs". Since we only CGNAT traffic to the external Internet, their on-netemployees hit the firewall from their 100.64/10 IPs and get blocked. I suggested they whitelist 100.64/10, saying we block traffic from100.64/10 from entering our network via peering and transit, so theycan be assured anything from 100.64/10 came from inside our network /our customers. They say the firewall won't let them whitelist100.64.0.0/10, giving an error that it's invalid IP space.
I know we're not the first to implement CGNAT, so I'm curious ifothers have run into these sorts of issues, or others we haven't runinto yet, and if so, how you solved them.
----------------------------------------------------------------------
 Jon Lewis, MCP :)              |  I route
 Blue Stream Fiber, Sr. Neteng  |  therefore you are
_________ http://www.lewis.org/~jlewis/pgp for PGP public key_________


--
-Aaron

Re: CGNAT growing pains

Reply via email to