Re: Issue with BGP router / high interrupt / Chelsio / FreeBSD 12.1

2020-02-14 Thread Andrey V. Elsukov
On 13.02.2020 06:21, Rudy wrote:
> 
> 
> I'm having issues with a box that is acting as a BGP router for my
> network.  3 Chelsio cards, two T5 and one T6.  It was working great
> until I turned up our first port on the T6.  It seems like traffic
> passing in from a T5 card and out the T6 causes a really high load (and
> high interrupts).
> 
> Traffic (not that much, right?)
> 
>  Dev  RX bps    TX bps    RX PPS    TX PPS Error
>  cc0   0 0 0 0 0
>  cc1    2212 M   7 M 250 k   6 k 0 (100Gbps uplink,
> filtering inbound routes to keep TX low)
>     cxl0 287 k    2015 M 353   244 k 0   (our network)
>     cxl1 940 M    3115 M 176 k 360 k 0 (our network)
>     cxl2 634 M    1014 M 103 k 128 k 0 (our network)
>     cxl3   1 k  16 M   1 4 k   0
>     cxl4   0 0 0 0 0
>     cxl5   0 0 0 0 0
>     cxl6    2343 M 791 M 275 k 137 k 0 (IX , part of lagg0)
>     cxl7    1675 M 762 M 215 k 133 k 0 (IX , part of lagg0)
>     ixl0 913 k  18 M   0 0 0
>     ixl1   1 M  30 M   0 0 0
>    lagg0    4019 M    1554 M 491 k 271 k   0
>    lagg1   1 M  48 M   0 0 0
> FreeBSD 12.1-STABLE orange 976 Bytes/Packetavg
>  1:42PM  up 13:25, 5 users, load averages: 9.38, 10.43, 9.827

Hi,

did you try to use pmcstat to determine what is the heaviest task for
your system?

# kldload hwpmc
# pmcstat -S inst_retired.any -Tw1

Then capture several first lines from the output and quit using 'q'.

Do you use some firewall? Also, can you show the snapshot from the `top
-HPSIzts1` output.

-- 
WBR, Andrey V. Elsukov



signature.asc
Description: OpenPGP digital signature


[Bug 194485] Userland cannot add IPv6 prefix routes

2020-02-14 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=194485

Alexander V. Chernikov  changed:

   What|Removed |Added

   Assignee|n...@freebsd.org |melif...@freebsd.org

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: Issue with BGP router / high interrupt / Chelsio / FreeBSD 12.1

2020-02-14 Thread Rudy

On 2/12/20 7:21 PM, Rudy wrote:
> I'm having issues with a box that is acting as a BGP router for my 
network.  3 Chelsio cards, two T5 and one T6.  It was working great 
until I turned up our first port on the T6.  It seems like traffic 
passing in from a T5 card and out the T6 causes a really high load (and 
high interrupts).



Looking better!  I made some changes based on BSDRP which I hadn't known 
about -- I think ifqmaxlen was the tunable I overlooked.


# 
https://github.com/ocochard/BSDRP/blob/master/BSDRP/Files/boot/loader.conf.local

net.link.ifqmaxlen="16384"

Also, I ran chelsio_affinity to bind queues to specific CPU cores.  The 
script only supports a single t5 card, I am revising and will submit a 
patch that will do multiple t5 and t6 cards.



I made both changes at once, and rebooted, so we'll never know which 
fixed it.  ;)




Right now, I have:

#/boot/loader.conf
#
# https://wiki.freebsd.org/10gFreeBSD/Router
hw.cxgbe.toecaps_allowed="0"
hw.cxgbe.rdmacaps_allowed="0"
hw.cxgbe.iscsicaps_allowed="0"
hw.cxgbe.fcoecaps_allowed="0"
hw.cxgbe.holdoff_timer_idx=3
# Before FreeBSD 13, threading bad on router: 
https://calomel.org/freebsd_network_tuning.html

machdep.hyperthreading_allowed="0"
hw.cxgbe.nrxq=16
hw.cxgbe.ntxq=16
hw.cxgbe.qsize_rxq=4096
hw.cxgbe.qsize_txq=4096
#hw.cxgbe.pause_settings="0"
# 
https://github.com/ocochard/BSDRP/blob/master/BSDRP/Files/boot/loader.conf.local


net.link.ifqmaxlen="16384"




#/etc/sysctl.conf
# FRR needs big buffers for OSPF
kern.ipc.maxsockbuf=16777216

# Turn FEC off (doesn't work with Cogent)
dev.cc.0.fec=0
dev.cc.1.fec=0

# Entropy not from LAN ports... slows them down.
kern.random.harvest.mask=65551

net.inet.icmp.icmplim=400
net.inet.icmp.maskrepl=0
net.inet.icmp.log_redirect=0
net.inet.icmp.drop_redirect=1
net.inet.tcp.drop_synfin=1
net.inet.tcp.blackhole=2  # drop any TCP packets to closed ports
net.inet.tcp.msl=7500 # close lost tcp connections in 7.5 
seconds (default 30)

net.inet.udp.blackhole=1  # drop any UDP packets to closed ports
#
hw.intr_storm_threshold=9000
net.inet.tcp.tso=0



___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


RE: Communication Technology Professionals

2020-02-14 Thread Sarah Nelson via freebsd-net
Hi,
 
Hope you're doing well!!
 
I am following up to find out if you had a chance to review my below email 
 
Please let me know if you require any additional information? 
 
Looking forward to hearing from you.
 
Regards,
Sarah
 
From: Sarah Nelson [mailto:sarah.nel...@datacloudspace.com] 
Sent: 13 February 2020 12:42
To: 'freebsd-net@freebsd.org'
Subject: Communication Technology Professionals
 
Hi, 
 
I was researching and looking for companies that would see a dramatic
improvement in their business if they had access to the below mentioned
database:
 
*  Doctors, Physicians, Surgeons, Nurses and Dentist across US - 752,591
records (with verified emails).
*  Top Hospital Executives across US - 323,974 records (with verified
emails).
*  Pharmaceuticals Industry Executives across US - 223,179 records (with
verified emails).
*  Top healthcare IT Executives across US - 25,583 records (with verified
emails).
*  Healthcare Software users list like EMR Users, EHR Users etc., 
 
All Industrial Sectors: Technology | Logistics | Oil & Gas | Automotive |
Energy| Transportation | Construction | Pharmaceuticals | Veterinary |
Travel & Tourism |Telecommunications | Retail | Banking | Manufacturing
|Interior Designers | Facility Management | Education & E-Learning
|Architects | Food & Beverages | Real Estate| HR | Hospitality | Aviation
and more. etc.
 
If you are looking for any specific lists let me know, so we could send over
counts & cost. Set up a time to discuss further.
 
Target Industry: __
Target Title: _
Target Geography: __
 
If there is someone else in your organization that I need to speak with, I'd
be grateful if you would forward this email to the appropriate contact and
help me with their introduction.
 
Appreciate your time and I look forward to hearing from you.
 
Regards,
 
Warm Regards,
Sarah Nelson | CD Services
If this email is irrelevant to you, please help us keep our lists clean by
replying to this email with UNSUBSCRIBE
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: Issue with BGP router / high interrupt / Chelsio / FreeBSD 12.1

2020-02-14 Thread Olivier Cochard-Labbé
On Fri, Feb 14, 2020 at 6:25 PM Rudy  wrote:

> On 2/12/20 7:21 PM, Rudy wrote:
>  > I'm having issues with a box that is acting as a BGP router for my
> network.  3 Chelsio cards, two T5 and one T6.  It was working great
> until I turned up our first port on the T6.  It seems like traffic
> passing in from a T5 card and out the T6 causes a really high load (and
> high interrupts).
>
>
> Looking better!  I made some changes based on BSDRP which I hadn't known
> about -- I think ifqmaxlen was the tunable I overlooked.
>
> #
>
> https://github.com/ocochard/BSDRP/blob/master/BSDRP/Files/boot/loader.conf.local
> net.link.ifqmaxlen="16384"
>
>
This net.link.ifqmaxlen was set to help in case of lagg usage: I was not
aware it could improve your use case.

>From your first post, it looks like your setup is a 2 packages, 10 cores,
20 threads (disabled).
And you have configured your Chelsio to use 16 queues (hw.cxgbe.Xrx=16):
It's a good think to have a power of 2 number of queues with Chelsio, but
I'm not sure it's a good idea to spread those queue across the 2 packages.
So perhaps you should try:
1. To reduce queues to 8 queues and bind them to the local domain
2. Or keeping 16 queues, but re-enabling HyperThreading and bing them to
the local domain too. (on -head with recent CPU
and machdep.hyperthreading_intr_allowed, using hyper-threading improve
forwarding performance).

But anyway even with 16 queues spread over 2 domains, you should have
better performance:
https://github.com/ocochard/netbenches/blob/master/Xeon_E5-2650v4_2x12Cores-Chelsio_T520-CR/hw.cxgbe.nXxq/results/fbsd12-stable.r354440.BSDRP.1.96/README.md

Notice that I never monitoring the CPU load during my benches.
Increasing the hw.cxgbe.holdoff_timer_idx was a good idea: I would expect
lower interrupt usage too.

Did you monitor the QPI link usage ? (kldload cpuctl && pcm-numa.x)
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: Issue with BGP router / high interrupt / Chelsio / FreeBSD 12.1

2020-02-14 Thread BulkMailForRudy


On 2/14/20 10:00 AM, Olivier Cochard-Labbé wrote:

On Fri, Feb 14, 2020 at 6:25 PM Rudy  wrote:


On 2/12/20 7:21 PM, Rudy wrote:
  > I'm having issues with a box that is acting as a BGP router for my
network.  3 Chelsio cards, two T5 and one T6.  It was working great
until I turned up our first port on the T6.  It seems like traffic
passing in from a T5 card and out the T6 causes a really high load (and
high interrupts).


Looking better!  I made some changes based on BSDRP which I hadn't known
about -- I think ifqmaxlen was the tunable I overlooked.

#

https://github.com/ocochard/BSDRP/blob/master/BSDRP/Files/boot/loader.conf.local
net.link.ifqmaxlen="16384"



This net.link.ifqmaxlen was set to help in case of lagg usage: I was not
aware it could improve your use case.



oThanks for the feedback.  Maybe it was a coincidence.  Load has creep 
back up to 15.




 From your first post, it looks like your setup is a 2 packages, 10 cores,
20 threads (disabled).
And you have configured your Chelsio to use 16 queues (hw.cxgbe.Xrx=16):
It's a good think to have a power of 2 number of queues with Chelsio, but
I'm not sure it's a good idea to spread those queue across the 2 packages.
So perhaps you should try:
1. To reduce queues to 8 queues and bind them to the local domain
2. Or keeping 16 queues, but re-enabling HyperThreading and bing them to
the local domain too. (on -head with recent CPU
and machdep.hyperthreading_intr_allowed, using hyper-threading improve
forwarding performance).

But anyway even with 16 queues spread over 2 domains, you should have
better performance:
https://github.com/ocochard/netbenches/blob/master/Xeon_E5-2650v4_2x12Cores-Chelsio_T520-CR/hw.cxgbe.nXxq/results/fbsd12-stable.r354440.BSDRP.1.96/README.md



OK, I can work on the chelsio_affinity script.   hour later ... OK, 
tested and updated on github.





Notice that I never monitoring the CPU load during my benches.
Increasing the hw.cxgbe.holdoff_timer_idx was a good idea: I would expect
lower interrupt usage too.


I've have some standard SNMP monitoring and can correlate the load 
spinning out of control to ping loss and packet loss.


# vmstat -i | tail -1
Total    12217353774 324329



Did you monitor the QPI link usage ? (kldload cpuctl && pcm-numa.x)



I haven't.  I'll look into that.  Hoping the numa-domain locking helps.




Currently I have things bound to the right domain, just need to shrink 
the queue size and reboot!



irq289: t6nex0:err:261 @cpu0(domain0): 0
irq290: t6nex0:evt:263 @cpu0(domain0): 4
irq291: t6nex0:0a0:265 @cpu1(domain0): 0
irq292: t6nex0:0a1:267 @cpu2(domain0): 0
irq293: t6nex0:0a2:269 @cpu3(domain0): 0
irq294: t6nex0:0a3:271 @cpu4(domain0): 0
irq295: t6nex0:0a4:273 @cpu5(domain0): 0
irq296: t6nex0:0a5:275 @cpu6(domain0): 0
irq297: t6nex0:0a6:277 @cpu7(domain0): 0
irq298: t6nex0:0a7:279 @cpu8(domain0): 0
irq299: t6nex0:0a8:281 @cpu9(domain0): 0
irq300: t6nex0:0a9:283 @cpu1(domain0): 0
irq301: t6nex0:0aa:285 @cpu2(domain0): 0
irq302: t6nex0:0ab:287 @cpu3(domain0): 0
irq303: t6nex0:0ac:289 @cpu4(domain0): 0
irq304: t6nex0:0ad:291 @cpu5(domain0): 0
irq305: t6nex0:0ae:293 @cpu6(domain0): 0
irq306: t6nex0:0af:295 @cpu7(domain0): 0
irq307: t6nex0:1a0:297 @cpu8(domain0): 185404641
irq308: t6nex0:1a1:299 @cpu9(domain0): 146802111
irq309: t6nex0:1a2:301 @cpu1(domain0): 133930820
irq310: t6nex0:1a3:303 @cpu2(domain0): 173156318
irq311: t6nex0:1a4:305 @cpu3(domain0): 132151349
irq312: t6nex0:1a5:307 @cpu4(domain0): 149108252
irq313: t6nex0:1a6:309 @cpu5(domain0): 149196634
irq314: t6nex0:1a7:311 @cpu6(domain0): 184211395
irq315: t6nex0:1a8:313 @cpu7(domain0): 151266056
irq316: t6nex0:1a9:315 @cpu8(domain0): 169259534
irq317: t6nex0:1aa:317 @cpu9(domain0): 164117244
irq318: t6nex0:1ab:319 @cpu1(domain0): 157471862
irq319: t6nex0:1ac:321 @cpu2(domain0): 127662140
irq320: t6nex0:1ad:323 @cpu3(domain0): 172750013
irq321: t6nex0:1ae:325 @cpu4(domain0): 173559485
irq322: t6nex0:1af:327 @cpu5(domain0): 227842473
irq323: t5nex0:err:329 @cpu0(domain1): 0
irq324: t5nex0:evt:331 @cpu0(domain1): 8
irq325: t5nex0:0a0:333 @cpu10(domain1): 1340449
irq326: t5nex0:0a1:335 @cpu11(domain1): 1128580
irq327: t5nex0:0a2:337 @cpu12(domain1): 1311599
irq328: t5nex0:0a3:339 @cpu13(domain1): 1157356
irq329: t5nex0:0a4:341 @cpu14(domain1): 1257426
irq330: t5nex0:0a5:343 @cpu15(domain1): 1169697
irq331: t5nex0:0a6:345 @cpu16(domain1): 1089689
irq332: t5nex0:0a7:347 @cpu17(domain1): 1117782
irq333: t5nex0:0a8:349 @cpu18(domain1): 1186770
irq334: t5nex0:0a9:351 @cpu19(domain1): 1147015
irq335: t5nex0:0aa:353 @cpu10(domain1): 1238148
irq336: t5nex0:0ab:355 @cpu11(domain1): 1134259
irq337: t5nex0:0ac:357 @cpu12(domain1): 1262301
irq338: t5nex0:0ad:359 @cpu13(domain1): 1233933
irq339: t5nex0:0ae:361 @cpu14(domain1): 1284298
irq340: t5nex0:0af:363 @cpu15(domain1): 1257873
irq341: t5nex0:1a0:365 @cpu16(domain1): 204307929
irq342: t5nex0:1a1:367 @cpu17(domain1): 221035308
irq343: t5nex0:1a2:369 @cpu18(domain1): 21

Re: Issue with BGP router / high interrupt / Chelsio / FreeBSD 12.1

2020-02-14 Thread BulkMailForRudy


On 2/14/20 4:21 AM, Andrey V. Elsukov wrote:

On 13.02.2020 06:21, Rudy wrote:


I'm having issues with a box that is acting as a BGP router for my
network.  3 Chelsio cards, two T5 and one T6.  It was working great
until I turned up our first port on the T6.  It seems like traffic
passing in from a T5 card and out the T6 causes a really high load (and
high interrupts).

Traffic (not that much, right?)

  Dev  RX bps    TX bps    RX PPS    TX PPS Error
  cc0   0 0 0 0 0
  cc1    2212 M   7 M 250 k   6 k 0 (100Gbps uplink,
filtering inbound routes to keep TX low)
     cxl0 287 k    2015 M 353   244 k 0   (our network)
     cxl1 940 M    3115 M 176 k 360 k 0 (our network)
     cxl2 634 M    1014 M 103 k 128 k 0 (our network)
     cxl3   1 k  16 M   1 4 k   0
     cxl4   0 0 0 0 0
     cxl5   0 0 0 0 0
     cxl6    2343 M 791 M 275 k 137 k 0 (IX , part of lagg0)
     cxl7    1675 M 762 M 215 k 133 k 0 (IX , part of lagg0)
     ixl0 913 k  18 M   0 0 0
     ixl1   1 M  30 M   0 0 0
    lagg0    4019 M    1554 M 491 k 271 k   0
    lagg1   1 M  48 M   0 0 0
FreeBSD 12.1-STABLE orange 976 Bytes/Packetavg
  1:42PM  up 13:25, 5 users, load averages: 9.38, 10.43, 9.827

Hi,

did you try to use pmcstat to determine what is the heaviest task for
your system?

# kldload hwpmc
# pmcstat -S inst_retired.any -Tw1



PMC: [inst_retired.any] Samples: 168557 (100.0%) , 2575 unresolved
Key: q => exiting...
%SAMP IMAGE  FUNCTION CALLERS
 16.6 kernel sched_idletd fork_exit
 14.7 kernel cpu_search_highest   cpu_search_highest:12.4 
sched_switch:1.4 sched_idletd:0.9
 10.5 kernel cpu_search_lowest    cpu_search_lowest:9.6 
sched_pickcpu:0.9

  4.2 kernel eth_tx   drain_ring
  3.4 kernel rn_match fib4_lookup_nh_basic
  2.4 kernel lock_delay   __mtx_lock_sleep
  1.9 kernel mac_ifnet_check_tran ether_output



Then capture several first lines from the output and quit using 'q'.

Do you use some firewall? Also, can you show the snapshot from the `top
-HPSIzts1` output.



last pid: 28863;  load averages:  9.30, 10.33, 
10.56    up 0+14:16:08  14:53:23

817 threads:   25 running, 586 sleeping, 206 waiting
CPU 0:   0.8% user,  0.0% nice,  6.2% system,  0.0% interrupt, 93.0% idle
CPU 1:   2.4% user,  0.0% nice,  0.0% system,  7.9% interrupt, 89.8% idle
CPU 2:   0.0% user,  0.0% nice,  0.8% system,  7.1% interrupt, 92.1% idle
CPU 3:   1.6% user,  0.0% nice,  0.0% system, 10.2% interrupt, 88.2% idle
CPU 4:   0.0% user,  0.0% nice,  0.0% system,  9.4% interrupt, 90.6% idle
CPU 5:   0.8% user,  0.0% nice,  0.8% system, 20.5% interrupt, 78.0% idle
CPU 6:   1.6% user,  0.0% nice,  0.0% system,  5.5% interrupt, 92.9% idle
CPU 7:   0.0% user,  0.0% nice,  0.0% system,  3.1% interrupt, 96.9% idle
CPU 8:   0.8% user,  0.0% nice,  0.8% system,  7.1% interrupt, 91.3% idle
CPU 9:   0.0% user,  0.0% nice,  0.8% system,  9.4% interrupt, 89.8% idle
CPU 10:  0.0% user,  0.0% nice,  0.0% system, 35.4% interrupt, 64.6% idle
CPU 11:  0.0% user,  0.0% nice,  0.0% system, 36.2% interrupt, 63.8% idle
CPU 12:  0.0% user,  0.0% nice,  0.0% system, 38.6% interrupt, 61.4% idle
CPU 13:  0.0% user,  0.0% nice,  0.0% system, 49.6% interrupt, 50.4% idle
CPU 14:  0.0% user,  0.0% nice,  0.0% system, 46.5% interrupt, 53.5% idle
CPU 15:  0.0% user,  0.0% nice,  0.0% system, 32.3% interrupt, 67.7% idle
CPU 16:  0.0% user,  0.0% nice,  0.0% system, 46.5% interrupt, 53.5% idle
CPU 17:  0.0% user,  0.0% nice,  0.0% system, 56.7% interrupt, 43.3% idle
CPU 18:  0.0% user,  0.0% nice,  0.0% system, 31.5% interrupt, 68.5% idle
CPU 19:  0.0% user,  0.0% nice,  0.8% system, 34.6% interrupt, 64.6% idle
Mem: 636M Active, 1159M Inact, 5578M Wired, 24G Free
ARC: 1430M Total, 327M MFU, 589M MRU, 32K Anon, 13M Header, 502M Other
 268M Compressed, 672M Uncompressed, 2.51:1 Ratio
Swap: 4096M Total, 4096M Free

  PID USERNAME    PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
   12 root    -92    - 0B  3376K WAIT    13  41:13  12.86% 
intr{irq358: t5nex0:2a1}
   12 root    -92    - 0B  3376K WAIT    12  48:08  12.77% 
intr{irq347: t5nex0:1a6}
   12 root    -92    - 0B  3376K CPU13   13  47:40  11.96% 
intr{irq348: t5nex0:1a7}
   12 root    -92    - 0B  3376K WAIT    17  43:46  11.38% 
intr{irq342: t5nex0:1a1}
   12 root    -92    - 0B  3376K WAIT    14  29:17  10.70% 
intr{irq369: t5nex0:2ac}
   12 root    -92    - 0B  3376K WAIT    11  47:55   9.85% 
intr{irq428: t5nex1:2a5}
   12 root    -92    - 0B  3376K WAIT    16  46:11   9.22% 
intr{irq351: t5nex0:1aa}
   12 root    -92    - 0B  3

Re: chelsio_affinity patch to support t6 cards

2020-02-14 Thread BulkMailForRudy


On 2/13/20 9:56 PM, Rudy wrote:


Supports t6 as well as t5 cards.  Also, is this desired?



___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"