Re: Proposed 6.2 em RELEASE patch
Our routers are with 2 em NICs, doing about 100kkps. Without kernel polling system become unstable, it seems that default interrupts moderation on em is 1 intr/sec. I have made some modification in kernel to decrease packet drops when polling is enabled - modify clock routines to allow high clock rate only for polling (5000 polls/sec) - remove netisr_poll_more - loop in ether_poll() untill there's ready packets Small optimization in em(): - add CSUM_IP checksum offloading on transmit - call em_txeof if there's more than 32 busy packet - remove E1000_TXD_CMD_RS (Report Status) and check (TDH == adapter->oldest_used_tx_desc). This should reduce PCI overhead, but adds one more PCI read on every em_txoef() call. OS: FreeBSD 4.11, em() is almost up to date with HEAD. - Original Message - From: "Scott Long" <[EMAIL PROTECTED]> To: "Mike Tancsa" <[EMAIL PROTECTED]> Cc: "freebsd-net" ; ; "Jack Vogel" <[EMAIL PROTECTED]> Sent: Saturday, November 11, 2006 8:42 AM Subject: Re: Proposed 6.2 em RELEASE patch > Mike Tancsa wrote: > > At 05:00 PM 11/10/2006, Jack Vogel wrote: > >> On 11/10/06, Mike Tancsa <[EMAIL PROTECTED]> wrote: > >>> > >>> Some more tests. I tried again with what was committed to today's > >>> RELENG_6. I am guessing its pretty well the same patch. Polling is > >>> the only way to avoid livelock at a high pps rate. Does anyone know > >>> of any simple tools to measure end to end packet loss ? Polling will > >>> end up dropping some packets and I want to be able to compare. Same > >>> hardware from the previous post. > >> > >> The commit WAS the last patch I posted. SO, making sure I understood you, > >> you are saying that POLLING is doing better than FAST_INTR, or only > >> better than the legacy code that went in with my merge? > > > > Hi, > > The last set of tests I posted are ONLY with what is in today's > > RELENG_6-- i.e. the latest commit. I did a few variations on the > > driver-- first with > > #define EM_FAST_INTR 1 > > in if_em.c > > > > one without > > > > and one with polling in the kernel. > > > > With a decent packet rate passing through, the box will lockup. Not > > sure if I am just hitting the limits of the PCIe bus, or interrupt > > moderation is not kicking in, or this is a case of "Doctor, it hurts > > when I send a lot of packets through"... "Well, dont do that" > > > > Using polling prevents the lockup, but it will of course drop packets. > > This is for firewalls with a fairly high bandwidth rate, as well as I > > need it to be able to survive a decent DDoS attack. I am not looking > > for 1Mpps, but something more than 100Kpps > > > > ---Mike > > Hi, > > Thanks for all of the data. I know that a good amount of testing was > done with single stream stress tests, but it's not clear how much was > done with multiple streams prior to your efforts. So, I'm not terribly > surprised by your results. I'm still a bit unclear on the exact > topology of your setup, so if could explain it some more in private > email, I'd appreciate it. > > For the short term, I don't think that there is anything that can be > magically tweaked that will safely give better results. I know that > Gleb has some ideas on a fairly simple change for the non-INTR_FAST, > non-POLLING case, but I and several others worry that it's not robust > in the face of real-world network problems. > > For the long term, I have a number of ideas for improving both the RX > and TX paths in the driver. Some of it is specific to the if_em driver, > some involve improvements in the FFWD and PFIL_HOOKS code as well as the > driver. What will help me is if you can hook up a serial console to > your machine and see if it can be made to drop to the debugger while it > is under load and otherwise unresponsive. If you can, getting a process > dump might help confirm where each CPU is spending its time. > > Scott > ___ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "[EMAIL PROTECTED]" > ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "[EMAIL PROTECTED]"
pf table synchronization between redundant routers (pfsync?)
Hi all, I'm thinking about adding support for pfsync to synchronize pf tables, so it can be used on redundant firewalls/routers setup. At first glance it looks fairly simple, just send/receive a message containing the table name, the prefix, and the action "add" or "remove". Has anyone tried something like this? The other thing that comes to my mind is for example a patched routed, that will work on pftables, instead of the kernel routing table? P.S: I know about pftabled, but i'm searching about different solution. -- Niki ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Proposed 6.2 em RELEASE patch
At 01:42 AM 11/11/2006, Scott Long wrote: surprised by your results. I'm still a bit unclear on the exact topology of your setup, so if could explain it some more in private email, I'd appreciate it. Hi, I made a quick diagram of the test setup that should make it more clear http://www.tancsa.com/blast.jpg Basically 5 boxes (plus my workstation for out of band access), the main one being tested is the box marked R2 which has a 2 port PCIe em NIC (Pro 1000PT) in the motherboard's 4X slot. I have 2 test boxes as UDP senders and 2 test boxes as UDP receivers, and all the packets flow through the 2 interfaces of R2. With one stream of packets being blasted across, the box is dropping some packets even on its OOB management interface. With 2, its totally unresponsive. Only with polling am I able to continue to work on the box via the OOB interface while one and even 2 streams of UDP packets are blasting across. However, in polling mode some amount of packets are being dropped and I guess I need to better understand how many. My goal in all this is to have a firewall / router that can withstand a high pps workload that will still be reachable OOB when under attack or even under high workload. To measure how many packets are dropped I was looking at making a modified netreceive to count the packets it gets so I can test to see if polling mode will be adequate for my needs. Lets say the max pps the box can handle is X, either in polling or non polling modes. As the box approaches X and gets pushed beyond X, I guess the ideal situation for my needs would be that it drops some packets on the busiest interface so that it can still function and service its other needs, be that network, disk, whatever. But my question is, is X the same for polling and non polling modes. For the short term, I don't think that there is anything that can be magically tweaked that will safely give better results. I know that Gleb has some ideas on a fairly simple change for the non-INTR_FAST, non-POLLING case, but I and several others worry that it's not robust in the face of real-world network problems. For the long term, I have a number of ideas for improving both the RX and TX paths in the driver. Some of it is specific to the if_em driver, some involve improvements in the FFWD and PFIL_HOOKS code as well as the driver. What will help me is if you can hook up a serial console to your machine and see if it can be made to drop to the debugger while it is under load and otherwise unresponsive. If you can, getting a process dump might help confirm where each CPU is spending its time. Yes, I will see what I can do over the weekend. I have some changes to babysit again tomorrow night and will see what I can do between cycles. ---Mike ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Proposed 6.2 em RELEASE patch
At 01:42 AM 11/11/2006, Scott Long wrote: driver. What will help me is if you can hook up a serial console to your machine and see if it can be made to drop to the debugger while it is under load and otherwise unresponsive. If you can, getting a process dump might help confirm where each CPU is spending its time. ./netblast 192.168.88.218 500 110 1000 I compiled in the various debugging options and on the serial console I get a few Expensive timeout(9) function: 0xc0601e48(0) 0.024135749 s and the serial console telnet> send break Expensive timeout(9) function: 0xc0561444(0xc63f1d80) 0.072485748 s KDB: enter: Line break on console [thread pid 27 tid 100017 ] Stopped at kdb_enter+0x2b: nop db> db> ps pid ppid pgrp uid state wmesg wchancmd 1206 1123 1206 0 R+ ifstat 1155 1154 1155 0 R+ csh 1154 1 1154 0 Ss+ wait 0xc6722218 login 1123 1122 1123 0 S+ pause0xc6722894 csh 1122 1117 1122 1002 S+ wait 0xc6739430 su 1117 1116 1117 1002 Ss+ pause0xc6aa024c csh 1116 1114 1114 1002 R sshd 1114 1028 1114 0 Ss sbwait 0xc6893370 sshd 1112 1 1112 0 Ss+ ttyin0xc65ba810 getty 1 0 Ss+ ttyin0xc65bac10 getty 1110 1 1110 0 Ss+ ttyin0xc65bb010 getty 1109 1 1109 0 Ss+ ttyin0xc65bc410 getty 1108 1 1108 0 Ss+ ttyin0xc65b4010 getty 1107 1 1107 0 Ss+ ttyin0xc65bd010 getty 1106 1 1106 0 Ss+ ttyin0xc65bcc10 getty 1105 1 1105 0 Ss+ ttyin0xc65b2010 getty 1044 1 1044 0 Ss nanslp 0xc076ecac cron 1038 1 103825 Ss pause0xc6ab2aac sendmail 1034 1 1034 0 Rs sendmail 1028 1 1028 0 Ss select 0xc07bc004 sshd 898 1 898 0 Ss bo_wwait 0xc6ac9130 syslogd 846 1 846 0 Ss select 0xc07bc004 devd 445 1 44565 Ss select 0xc07bc004 dhclient 425 143 0 S+ select 0xc07bc004 dhclient 124 1 124 0 Ss pause0xc6739034 adjkerntz 42 0 0 0 SL -0xe8ff9d04 [schedcpu] 41 0 0 0 SL sdflush 0xc07c50f4 [softdepflush] 40 0 0 0 RL [syncer] 39 0 0 0 RL [vnlru] 38 0 0 0 SL psleep 0xc07bc56c [bufdaemon] 37 0 0 0 SL pgzero 0xc07c6064 [pagezero] 36 0 0 0 SL psleep 0xc07c5bb4 [vmdaemon] 35 0 0 0 SL psleep 0xc07c5b70 [pagedaemon] 34 0 0 0 WL [irq1: atkbd0] 33 0 0 0 WL [irq7: ppc0] 32 0 0 0 WL [swi0: sio] 31 0 0 0 RL [acpi_cooling0] 30 0 0 0 SL tzpoll 0xc08cd838 [acpi_thermal] 29 0 0 0 WL [irq19: bge1] 28 0 0 0 WL [irq16: bge0+] 27 0 0 0 RL CPU 1 [irq18: em1] 26 0 0 0 WL [irq17: em0] 25 0 0 0 WL [irq23: nve0] 24 0 0 0 WL [irq22: atapci2] 23 0 0 0 WL [irq21: atapci1] 22 0 0 0 WL [irq15: ata1] 21 0 0 0 WL [irq14: ata0] 20 0 0 0 WL [irq9: acpi0] 9 0 0 0 SL -0xc645f080 [kqueue taskq] 19 0 0 0 WL [swi2: cambio] 8 0 0 0 SL -0xc645f280 [acpi_task_2] 7 0 0 0 SL -0xc645f280 [acpi_task_1] 6 0 0 0 SL -0xc645f280 [acpi_task_0] 18 0 0 0 WL [swi5: +] 5 0 0 0 SL -0xc645f400 [thread taskq] 17 0 0 0 WL [swi6: Giant taskq] 16 0 0 0 WL [swi6: task queue] 15 0 0 0 RL [yarrow] 4 0 0 0 RL [g_down] 3 0 0 0 RL [g_up] 2 0 0 0 RL [g_event] 14 0 0 0 WL [swi3: vm] 13 0 0 0 RL CPU 0 [swi4: clock sio] 12 0 0 0 WL [swi1: net] 11 0 0 0 RL [idle: cpu0] 10 0 0 0 RL [idle: cpu1] 1 0 1