Thanks for this.. turned off netflow export.. and it dropped our qfp load from 44% to 18%. ugh..
--- Colin Legendre On Thu, Dec 9, 2021 at 4:22 AM Brian Turnbow via NANOG <nanog@nanog.org> wrote: > > > > On 11/26/2021 1:09 PM, Colin Legendre wrote: > > > Hi, > > > > > > We have ... > > > > > > ASR1006 that has following cards... > > > 1 x ESP40 > > > 1 x SIP40 > > > 4 x SPA-1x10GE-L-V2 > > > 1 x 6TGE > > > 1 x RP2 > > > > > > We've been having latency and packet loss during peak periods... > > > > > > We notice all is good until we reach 50% utilization on output of... > > > > > > 'show platform hardware qfp active datapath utilization summary' > > > > > > Literally ... 47% good... 48% good... 49% latency to next hop goes > > > from 1ms to 15-20ms... 50% we see 1-2% packet-loss and 30-40ms > > > latency... 53% we see 60-70ms latency and 8-10% packet loss. > > > > > > Is this expected... the ESP40 can only really push 20G and then starts > > > to have performance issues? > > > > > He had a similar issue about 4 years ago. > We were showing packet loss and drops getting progressively worse and the > router was falling over when reaching about 70% of usage. > We could see the interface reliability go down and input errors due to > overruns on the interfaces. > Cisco blamed it on microburtst not being able to be handled under load. > > > "We were able to replicate this scenario in our lab as well. > QFP under high load generated input errors and overruns which in turn led > to unicast failures/ drops/ latency. > The issue is not consistent with QFP % utilization as sometimes with even > 80%+ traffic, we do not see the drops:" > > And recommended removing traffic or upgrading esp. > > One of our guys disabled nbar on the router and the problem disappeared. > I would suggest taking a look at what features you are using and if you > can try and disable them to see if it makes any impact. > We then upgraded esps and all has been fine since. > > Brian > >