Re: Issue with BGP router / high interrupt / Chelsio / FreeBSD 12.1
On 13.02.2020 06:21, Rudy wrote: > > > I'm having issues with a box that is acting as a BGP router for my > network. 3 Chelsio cards, two T5 and one T6. It was working great > until I turned up our first port on the T6. It seems like traffic > passing in from a T5 card and out the T6 causes a really high load (and > high interrupts). > > Traffic (not that much, right?) > > Dev RX bps TX bps RX PPS TX PPS Error > cc0 0 0 0 0 0 > cc1 2212 M 7 M 250 k 6 k 0 (100Gbps uplink, > filtering inbound routes to keep TX low) > cxl0 287 k 2015 M 353 244 k 0 (our network) > cxl1 940 M 3115 M 176 k 360 k 0 (our network) > cxl2 634 M 1014 M 103 k 128 k 0 (our network) > cxl3 1 k 16 M 1 4 k 0 > cxl4 0 0 0 0 0 > cxl5 0 0 0 0 0 > cxl6 2343 M 791 M 275 k 137 k 0 (IX , part of lagg0) > cxl7 1675 M 762 M 215 k 133 k 0 (IX , part of lagg0) > ixl0 913 k 18 M 0 0 0 > ixl1 1 M 30 M 0 0 0 > lagg0 4019 M 1554 M 491 k 271 k 0 > lagg1 1 M 48 M 0 0 0 > FreeBSD 12.1-STABLE orange 976 Bytes/Packetavg > 1:42PM up 13:25, 5 users, load averages: 9.38, 10.43, 9.827 Hi, did you try to use pmcstat to determine what is the heaviest task for your system? # kldload hwpmc # pmcstat -S inst_retired.any -Tw1 Then capture several first lines from the output and quit using 'q'. Do you use some firewall? Also, can you show the snapshot from the `top -HPSIzts1` output. -- WBR, Andrey V. Elsukov signature.asc Description: OpenPGP digital signature
[Bug 194485] Userland cannot add IPv6 prefix routes
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=194485 Alexander V. Chernikov changed: What|Removed |Added Assignee|n...@freebsd.org |melif...@freebsd.org -- You are receiving this mail because: You are the assignee for the bug. You are on the CC list for the bug. ___ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: Issue with BGP router / high interrupt / Chelsio / FreeBSD 12.1
On 2/12/20 7:21 PM, Rudy wrote: > I'm having issues with a box that is acting as a BGP router for my network. 3 Chelsio cards, two T5 and one T6. It was working great until I turned up our first port on the T6. It seems like traffic passing in from a T5 card and out the T6 causes a really high load (and high interrupts). Looking better! I made some changes based on BSDRP which I hadn't known about -- I think ifqmaxlen was the tunable I overlooked. # https://github.com/ocochard/BSDRP/blob/master/BSDRP/Files/boot/loader.conf.local net.link.ifqmaxlen="16384" Also, I ran chelsio_affinity to bind queues to specific CPU cores. The script only supports a single t5 card, I am revising and will submit a patch that will do multiple t5 and t6 cards. I made both changes at once, and rebooted, so we'll never know which fixed it. ;) Right now, I have: #/boot/loader.conf # # https://wiki.freebsd.org/10gFreeBSD/Router hw.cxgbe.toecaps_allowed="0" hw.cxgbe.rdmacaps_allowed="0" hw.cxgbe.iscsicaps_allowed="0" hw.cxgbe.fcoecaps_allowed="0" hw.cxgbe.holdoff_timer_idx=3 # Before FreeBSD 13, threading bad on router: https://calomel.org/freebsd_network_tuning.html machdep.hyperthreading_allowed="0" hw.cxgbe.nrxq=16 hw.cxgbe.ntxq=16 hw.cxgbe.qsize_rxq=4096 hw.cxgbe.qsize_txq=4096 #hw.cxgbe.pause_settings="0" # https://github.com/ocochard/BSDRP/blob/master/BSDRP/Files/boot/loader.conf.local net.link.ifqmaxlen="16384" #/etc/sysctl.conf # FRR needs big buffers for OSPF kern.ipc.maxsockbuf=16777216 # Turn FEC off (doesn't work with Cogent) dev.cc.0.fec=0 dev.cc.1.fec=0 # Entropy not from LAN ports... slows them down. kern.random.harvest.mask=65551 net.inet.icmp.icmplim=400 net.inet.icmp.maskrepl=0 net.inet.icmp.log_redirect=0 net.inet.icmp.drop_redirect=1 net.inet.tcp.drop_synfin=1 net.inet.tcp.blackhole=2 # drop any TCP packets to closed ports net.inet.tcp.msl=7500 # close lost tcp connections in 7.5 seconds (default 30) net.inet.udp.blackhole=1 # drop any UDP packets to closed ports # hw.intr_storm_threshold=9000 net.inet.tcp.tso=0 ___ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
RE: Communication Technology Professionals
Hi, Hope you're doing well!! I am following up to find out if you had a chance to review my below email Please let me know if you require any additional information? Looking forward to hearing from you. Regards, Sarah From: Sarah Nelson [mailto:sarah.nel...@datacloudspace.com] Sent: 13 February 2020 12:42 To: 'freebsd-net@freebsd.org' Subject: Communication Technology Professionals Hi, I was researching and looking for companies that would see a dramatic improvement in their business if they had access to the below mentioned database: * Doctors, Physicians, Surgeons, Nurses and Dentist across US - 752,591 records (with verified emails). * Top Hospital Executives across US - 323,974 records (with verified emails). * Pharmaceuticals Industry Executives across US - 223,179 records (with verified emails). * Top healthcare IT Executives across US - 25,583 records (with verified emails). * Healthcare Software users list like EMR Users, EHR Users etc., All Industrial Sectors: Technology | Logistics | Oil & Gas | Automotive | Energy| Transportation | Construction | Pharmaceuticals | Veterinary | Travel & Tourism |Telecommunications | Retail | Banking | Manufacturing |Interior Designers | Facility Management | Education & E-Learning |Architects | Food & Beverages | Real Estate| HR | Hospitality | Aviation and more. etc. If you are looking for any specific lists let me know, so we could send over counts & cost. Set up a time to discuss further. Target Industry: __ Target Title: _ Target Geography: __ If there is someone else in your organization that I need to speak with, I'd be grateful if you would forward this email to the appropriate contact and help me with their introduction. Appreciate your time and I look forward to hearing from you. Regards, Warm Regards, Sarah Nelson | CD Services If this email is irrelevant to you, please help us keep our lists clean by replying to this email with UNSUBSCRIBE ___ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: Issue with BGP router / high interrupt / Chelsio / FreeBSD 12.1
On Fri, Feb 14, 2020 at 6:25 PM Rudy wrote: > On 2/12/20 7:21 PM, Rudy wrote: > > I'm having issues with a box that is acting as a BGP router for my > network. 3 Chelsio cards, two T5 and one T6. It was working great > until I turned up our first port on the T6. It seems like traffic > passing in from a T5 card and out the T6 causes a really high load (and > high interrupts). > > > Looking better! I made some changes based on BSDRP which I hadn't known > about -- I think ifqmaxlen was the tunable I overlooked. > > # > > https://github.com/ocochard/BSDRP/blob/master/BSDRP/Files/boot/loader.conf.local > net.link.ifqmaxlen="16384" > > This net.link.ifqmaxlen was set to help in case of lagg usage: I was not aware it could improve your use case. >From your first post, it looks like your setup is a 2 packages, 10 cores, 20 threads (disabled). And you have configured your Chelsio to use 16 queues (hw.cxgbe.Xrx=16): It's a good think to have a power of 2 number of queues with Chelsio, but I'm not sure it's a good idea to spread those queue across the 2 packages. So perhaps you should try: 1. To reduce queues to 8 queues and bind them to the local domain 2. Or keeping 16 queues, but re-enabling HyperThreading and bing them to the local domain too. (on -head with recent CPU and machdep.hyperthreading_intr_allowed, using hyper-threading improve forwarding performance). But anyway even with 16 queues spread over 2 domains, you should have better performance: https://github.com/ocochard/netbenches/blob/master/Xeon_E5-2650v4_2x12Cores-Chelsio_T520-CR/hw.cxgbe.nXxq/results/fbsd12-stable.r354440.BSDRP.1.96/README.md Notice that I never monitoring the CPU load during my benches. Increasing the hw.cxgbe.holdoff_timer_idx was a good idea: I would expect lower interrupt usage too. Did you monitor the QPI link usage ? (kldload cpuctl && pcm-numa.x) ___ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: Issue with BGP router / high interrupt / Chelsio / FreeBSD 12.1
On 2/14/20 10:00 AM, Olivier Cochard-Labbé wrote: On Fri, Feb 14, 2020 at 6:25 PM Rudy wrote: On 2/12/20 7:21 PM, Rudy wrote: > I'm having issues with a box that is acting as a BGP router for my network. 3 Chelsio cards, two T5 and one T6. It was working great until I turned up our first port on the T6. It seems like traffic passing in from a T5 card and out the T6 causes a really high load (and high interrupts). Looking better! I made some changes based on BSDRP which I hadn't known about -- I think ifqmaxlen was the tunable I overlooked. # https://github.com/ocochard/BSDRP/blob/master/BSDRP/Files/boot/loader.conf.local net.link.ifqmaxlen="16384" This net.link.ifqmaxlen was set to help in case of lagg usage: I was not aware it could improve your use case. oThanks for the feedback. Maybe it was a coincidence. Load has creep back up to 15. From your first post, it looks like your setup is a 2 packages, 10 cores, 20 threads (disabled). And you have configured your Chelsio to use 16 queues (hw.cxgbe.Xrx=16): It's a good think to have a power of 2 number of queues with Chelsio, but I'm not sure it's a good idea to spread those queue across the 2 packages. So perhaps you should try: 1. To reduce queues to 8 queues and bind them to the local domain 2. Or keeping 16 queues, but re-enabling HyperThreading and bing them to the local domain too. (on -head with recent CPU and machdep.hyperthreading_intr_allowed, using hyper-threading improve forwarding performance). But anyway even with 16 queues spread over 2 domains, you should have better performance: https://github.com/ocochard/netbenches/blob/master/Xeon_E5-2650v4_2x12Cores-Chelsio_T520-CR/hw.cxgbe.nXxq/results/fbsd12-stable.r354440.BSDRP.1.96/README.md OK, I can work on the chelsio_affinity script. hour later ... OK, tested and updated on github. Notice that I never monitoring the CPU load during my benches. Increasing the hw.cxgbe.holdoff_timer_idx was a good idea: I would expect lower interrupt usage too. I've have some standard SNMP monitoring and can correlate the load spinning out of control to ping loss and packet loss. # vmstat -i | tail -1 Total 12217353774 324329 Did you monitor the QPI link usage ? (kldload cpuctl && pcm-numa.x) I haven't. I'll look into that. Hoping the numa-domain locking helps. Currently I have things bound to the right domain, just need to shrink the queue size and reboot! irq289: t6nex0:err:261 @cpu0(domain0): 0 irq290: t6nex0:evt:263 @cpu0(domain0): 4 irq291: t6nex0:0a0:265 @cpu1(domain0): 0 irq292: t6nex0:0a1:267 @cpu2(domain0): 0 irq293: t6nex0:0a2:269 @cpu3(domain0): 0 irq294: t6nex0:0a3:271 @cpu4(domain0): 0 irq295: t6nex0:0a4:273 @cpu5(domain0): 0 irq296: t6nex0:0a5:275 @cpu6(domain0): 0 irq297: t6nex0:0a6:277 @cpu7(domain0): 0 irq298: t6nex0:0a7:279 @cpu8(domain0): 0 irq299: t6nex0:0a8:281 @cpu9(domain0): 0 irq300: t6nex0:0a9:283 @cpu1(domain0): 0 irq301: t6nex0:0aa:285 @cpu2(domain0): 0 irq302: t6nex0:0ab:287 @cpu3(domain0): 0 irq303: t6nex0:0ac:289 @cpu4(domain0): 0 irq304: t6nex0:0ad:291 @cpu5(domain0): 0 irq305: t6nex0:0ae:293 @cpu6(domain0): 0 irq306: t6nex0:0af:295 @cpu7(domain0): 0 irq307: t6nex0:1a0:297 @cpu8(domain0): 185404641 irq308: t6nex0:1a1:299 @cpu9(domain0): 146802111 irq309: t6nex0:1a2:301 @cpu1(domain0): 133930820 irq310: t6nex0:1a3:303 @cpu2(domain0): 173156318 irq311: t6nex0:1a4:305 @cpu3(domain0): 132151349 irq312: t6nex0:1a5:307 @cpu4(domain0): 149108252 irq313: t6nex0:1a6:309 @cpu5(domain0): 149196634 irq314: t6nex0:1a7:311 @cpu6(domain0): 184211395 irq315: t6nex0:1a8:313 @cpu7(domain0): 151266056 irq316: t6nex0:1a9:315 @cpu8(domain0): 169259534 irq317: t6nex0:1aa:317 @cpu9(domain0): 164117244 irq318: t6nex0:1ab:319 @cpu1(domain0): 157471862 irq319: t6nex0:1ac:321 @cpu2(domain0): 127662140 irq320: t6nex0:1ad:323 @cpu3(domain0): 172750013 irq321: t6nex0:1ae:325 @cpu4(domain0): 173559485 irq322: t6nex0:1af:327 @cpu5(domain0): 227842473 irq323: t5nex0:err:329 @cpu0(domain1): 0 irq324: t5nex0:evt:331 @cpu0(domain1): 8 irq325: t5nex0:0a0:333 @cpu10(domain1): 1340449 irq326: t5nex0:0a1:335 @cpu11(domain1): 1128580 irq327: t5nex0:0a2:337 @cpu12(domain1): 1311599 irq328: t5nex0:0a3:339 @cpu13(domain1): 1157356 irq329: t5nex0:0a4:341 @cpu14(domain1): 1257426 irq330: t5nex0:0a5:343 @cpu15(domain1): 1169697 irq331: t5nex0:0a6:345 @cpu16(domain1): 1089689 irq332: t5nex0:0a7:347 @cpu17(domain1): 1117782 irq333: t5nex0:0a8:349 @cpu18(domain1): 1186770 irq334: t5nex0:0a9:351 @cpu19(domain1): 1147015 irq335: t5nex0:0aa:353 @cpu10(domain1): 1238148 irq336: t5nex0:0ab:355 @cpu11(domain1): 1134259 irq337: t5nex0:0ac:357 @cpu12(domain1): 1262301 irq338: t5nex0:0ad:359 @cpu13(domain1): 1233933 irq339: t5nex0:0ae:361 @cpu14(domain1): 1284298 irq340: t5nex0:0af:363 @cpu15(domain1): 1257873 irq341: t5nex0:1a0:365 @cpu16(domain1): 204307929 irq342: t5nex0:1a1:367 @cpu17(domain1): 221035308 irq343: t5nex0:1a2:369 @cpu18(domain1): 21
Re: Issue with BGP router / high interrupt / Chelsio / FreeBSD 12.1
On 2/14/20 4:21 AM, Andrey V. Elsukov wrote: On 13.02.2020 06:21, Rudy wrote: I'm having issues with a box that is acting as a BGP router for my network. 3 Chelsio cards, two T5 and one T6. It was working great until I turned up our first port on the T6. It seems like traffic passing in from a T5 card and out the T6 causes a really high load (and high interrupts). Traffic (not that much, right?) Dev RX bps TX bps RX PPS TX PPS Error cc0 0 0 0 0 0 cc1 2212 M 7 M 250 k 6 k 0 (100Gbps uplink, filtering inbound routes to keep TX low) cxl0 287 k 2015 M 353 244 k 0 (our network) cxl1 940 M 3115 M 176 k 360 k 0 (our network) cxl2 634 M 1014 M 103 k 128 k 0 (our network) cxl3 1 k 16 M 1 4 k 0 cxl4 0 0 0 0 0 cxl5 0 0 0 0 0 cxl6 2343 M 791 M 275 k 137 k 0 (IX , part of lagg0) cxl7 1675 M 762 M 215 k 133 k 0 (IX , part of lagg0) ixl0 913 k 18 M 0 0 0 ixl1 1 M 30 M 0 0 0 lagg0 4019 M 1554 M 491 k 271 k 0 lagg1 1 M 48 M 0 0 0 FreeBSD 12.1-STABLE orange 976 Bytes/Packetavg 1:42PM up 13:25, 5 users, load averages: 9.38, 10.43, 9.827 Hi, did you try to use pmcstat to determine what is the heaviest task for your system? # kldload hwpmc # pmcstat -S inst_retired.any -Tw1 PMC: [inst_retired.any] Samples: 168557 (100.0%) , 2575 unresolved Key: q => exiting... %SAMP IMAGE FUNCTION CALLERS 16.6 kernel sched_idletd fork_exit 14.7 kernel cpu_search_highest cpu_search_highest:12.4 sched_switch:1.4 sched_idletd:0.9 10.5 kernel cpu_search_lowest cpu_search_lowest:9.6 sched_pickcpu:0.9 4.2 kernel eth_tx drain_ring 3.4 kernel rn_match fib4_lookup_nh_basic 2.4 kernel lock_delay __mtx_lock_sleep 1.9 kernel mac_ifnet_check_tran ether_output Then capture several first lines from the output and quit using 'q'. Do you use some firewall? Also, can you show the snapshot from the `top -HPSIzts1` output. last pid: 28863; load averages: 9.30, 10.33, 10.56 up 0+14:16:08 14:53:23 817 threads: 25 running, 586 sleeping, 206 waiting CPU 0: 0.8% user, 0.0% nice, 6.2% system, 0.0% interrupt, 93.0% idle CPU 1: 2.4% user, 0.0% nice, 0.0% system, 7.9% interrupt, 89.8% idle CPU 2: 0.0% user, 0.0% nice, 0.8% system, 7.1% interrupt, 92.1% idle CPU 3: 1.6% user, 0.0% nice, 0.0% system, 10.2% interrupt, 88.2% idle CPU 4: 0.0% user, 0.0% nice, 0.0% system, 9.4% interrupt, 90.6% idle CPU 5: 0.8% user, 0.0% nice, 0.8% system, 20.5% interrupt, 78.0% idle CPU 6: 1.6% user, 0.0% nice, 0.0% system, 5.5% interrupt, 92.9% idle CPU 7: 0.0% user, 0.0% nice, 0.0% system, 3.1% interrupt, 96.9% idle CPU 8: 0.8% user, 0.0% nice, 0.8% system, 7.1% interrupt, 91.3% idle CPU 9: 0.0% user, 0.0% nice, 0.8% system, 9.4% interrupt, 89.8% idle CPU 10: 0.0% user, 0.0% nice, 0.0% system, 35.4% interrupt, 64.6% idle CPU 11: 0.0% user, 0.0% nice, 0.0% system, 36.2% interrupt, 63.8% idle CPU 12: 0.0% user, 0.0% nice, 0.0% system, 38.6% interrupt, 61.4% idle CPU 13: 0.0% user, 0.0% nice, 0.0% system, 49.6% interrupt, 50.4% idle CPU 14: 0.0% user, 0.0% nice, 0.0% system, 46.5% interrupt, 53.5% idle CPU 15: 0.0% user, 0.0% nice, 0.0% system, 32.3% interrupt, 67.7% idle CPU 16: 0.0% user, 0.0% nice, 0.0% system, 46.5% interrupt, 53.5% idle CPU 17: 0.0% user, 0.0% nice, 0.0% system, 56.7% interrupt, 43.3% idle CPU 18: 0.0% user, 0.0% nice, 0.0% system, 31.5% interrupt, 68.5% idle CPU 19: 0.0% user, 0.0% nice, 0.8% system, 34.6% interrupt, 64.6% idle Mem: 636M Active, 1159M Inact, 5578M Wired, 24G Free ARC: 1430M Total, 327M MFU, 589M MRU, 32K Anon, 13M Header, 502M Other 268M Compressed, 672M Uncompressed, 2.51:1 Ratio Swap: 4096M Total, 4096M Free PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND 12 root -92 - 0B 3376K WAIT 13 41:13 12.86% intr{irq358: t5nex0:2a1} 12 root -92 - 0B 3376K WAIT 12 48:08 12.77% intr{irq347: t5nex0:1a6} 12 root -92 - 0B 3376K CPU13 13 47:40 11.96% intr{irq348: t5nex0:1a7} 12 root -92 - 0B 3376K WAIT 17 43:46 11.38% intr{irq342: t5nex0:1a1} 12 root -92 - 0B 3376K WAIT 14 29:17 10.70% intr{irq369: t5nex0:2ac} 12 root -92 - 0B 3376K WAIT 11 47:55 9.85% intr{irq428: t5nex1:2a5} 12 root -92 - 0B 3376K WAIT 16 46:11 9.22% intr{irq351: t5nex0:1aa} 12 root -92 - 0B 3
Re: chelsio_affinity patch to support t6 cards
On 2/13/20 9:56 PM, Rudy wrote: Supports t6 as well as t5 cards. Also, is this desired? ___ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"