Re: Proposed 6.2 em RELEASE patch

2006-11-11 Thread Mihail Balikov
Our routers are with 2 em NICs, doing about 100kkps.  Without kernel polling
system become unstable, it seems that default interrupts moderation on em is
1 intr/sec.

I have made some modification in kernel to decrease packet drops when
polling is enabled
- modify clock routines to allow high clock rate only for polling (5000
polls/sec)
- remove netisr_poll_more
- loop in ether_poll() untill there's ready packets

Small optimization in em():
- add CSUM_IP checksum offloading on transmit
- call em_txeof if there's more than 32 busy packet
- remove E1000_TXD_CMD_RS (Report Status) and check (TDH ==
adapter->oldest_used_tx_desc). This should reduce PCI overhead, but adds one
more PCI read on every em_txoef() call.

OS: FreeBSD 4.11, em() is almost up to date with HEAD.


- Original Message - 
From: "Scott Long" <[EMAIL PROTECTED]>
To: "Mike Tancsa" <[EMAIL PROTECTED]>
Cc: "freebsd-net" ; ;
"Jack Vogel" <[EMAIL PROTECTED]>
Sent: Saturday, November 11, 2006 8:42 AM
Subject: Re: Proposed 6.2 em RELEASE patch


> Mike Tancsa wrote:
> > At 05:00 PM 11/10/2006, Jack Vogel wrote:
> >> On 11/10/06, Mike Tancsa <[EMAIL PROTECTED]> wrote:
> >>>
> >>> Some more tests. I tried again with what was committed to today's
> >>> RELENG_6. I am guessing its pretty well the same patch.  Polling is
> >>> the only way to avoid livelock at a high pps rate.  Does anyone know
> >>> of any simple tools to measure end to end packet loss ? Polling will
> >>> end up dropping some packets and I want to be able to compare.  Same
> >>> hardware from the previous post.
> >>
> >> The commit WAS the last patch I posted. SO, making sure I understood
you,
> >> you are saying that POLLING is doing better than FAST_INTR, or only
> >> better than the legacy code that went in with my merge?
> >
> > Hi,
> > The last set of tests I posted are ONLY with what is in today's
> > RELENG_6-- i.e. the latest commit. I did a few variations on the
> > driver-- first with
> > #define EM_FAST_INTR 1
> > in if_em.c
> >
> > one without
> >
> > and one with polling in the kernel.
> >
> > With a decent packet rate passing through, the box will lockup.  Not
> > sure if I am just hitting the limits of the PCIe bus, or interrupt
> > moderation is not kicking in, or this is a case of "Doctor, it hurts
> > when I send a lot of packets through"... "Well, dont do that"
> >
> > Using polling prevents the lockup, but it will of course drop packets.
> > This is for firewalls with a fairly high bandwidth rate, as well as I
> > need it to be able to survive a decent DDoS attack.  I am not looking
> > for 1Mpps, but something more than 100Kpps
> >
> > ---Mike
>
> Hi,
>
> Thanks for all of the data.  I know that a good amount of testing was
> done with single stream stress tests, but it's not clear how much was
> done with multiple streams prior to your efforts.  So, I'm not terribly
> surprised by your results.  I'm still a bit unclear on the exact
> topology of your setup, so if could explain it some more in private
> email, I'd appreciate it.
>
> For the short term, I don't think that there is anything that can be
> magically tweaked that will safely give better results.  I know that
> Gleb has some ideas on a fairly simple change for the non-INTR_FAST,
> non-POLLING case, but I and several others worry that it's not robust
> in the face of real-world network problems.
>
> For the long term, I have a number of ideas for improving both the RX
> and TX paths in the driver.  Some of it is specific to the if_em driver,
> some involve improvements in the FFWD and PFIL_HOOKS code as well as the
> driver.  What will help me is if you can hook up a serial console to
> your machine and see if it can be made to drop to the debugger while it
> is under load and otherwise unresponsive.  If you can, getting a process
> dump might help confirm where each CPU is spending its time.
>
> Scott
> ___
> freebsd-net@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "[EMAIL PROTECTED]"
>

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


pf table synchronization between redundant routers (pfsync?)

2006-11-11 Thread Nikolay Denev

Hi all,

I'm thinking about adding support for pfsync to synchronize
pf tables, so it can be used on redundant firewalls/routers setup.

At first glance it looks fairly simple, just send/receive
a message containing the table name, the prefix, and the action "add" or 
 "remove".


Has anyone tried something like this?
The other thing that comes to my mind is for example a patched routed, 
that will work on pftables, instead of the kernel routing table?


P.S: I know about pftabled, but i'm searching about different solution.

--
Niki
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Proposed 6.2 em RELEASE patch

2006-11-11 Thread Mike Tancsa

At 01:42 AM 11/11/2006, Scott Long wrote:


surprised by your results.  I'm still a bit unclear on the exact
topology of your setup, so if could explain it some more in private
email, I'd appreciate it.


Hi,
I made a quick diagram of the test setup that should make it 
more clear


http://www.tancsa.com/blast.jpg

Basically 5 boxes (plus my workstation for out of band access), the 
main one being tested is the box marked R2 which has a 2 port PCIe em 
NIC (Pro 1000PT) in the motherboard's 4X slot.  I have 2 test boxes 
as UDP senders and 2 test boxes as UDP receivers, and all the packets 
flow through the 2 interfaces of R2.  With one stream of packets 
being blasted across, the box is dropping some packets even on its 
OOB management interface. With 2, its totally unresponsive.  Only 
with polling am I able to continue to work on the box via the OOB 
interface while one and even 2 streams of UDP packets are blasting 
across.  However, in polling mode some amount of packets are being 
dropped and I guess I need to better understand how many.  My goal in 
all this is to have a firewall / router that can withstand a high pps 
workload that will still be reachable OOB when under attack or even 
under high workload.


To measure how many packets are dropped I was looking at making a 
modified netreceive to count the packets it gets so I can test to see 
if polling mode will be adequate for my needs.


Lets say the max pps the box can handle is X, either in polling or 
non polling modes.  As the box approaches X and gets pushed beyond X, 
I guess the ideal situation for my needs would be that it drops some 
packets on the busiest interface so that it can still function and 
service its other needs, be that network, disk, whatever. But my 
question is, is X the same for polling and non polling modes.




For the short term, I don't think that there is anything that can be
magically tweaked that will safely give better results.  I know that
Gleb has some ideas on a fairly simple change for the non-INTR_FAST,
non-POLLING case, but I and several others worry that it's not robust
in the face of real-world network problems.

For the long term, I have a number of ideas for improving both the RX
and TX paths in the driver.  Some of it is specific to the if_em driver,
some involve improvements in the FFWD and PFIL_HOOKS code as well as the
driver.  What will help me is if you can hook up a serial console to
your machine and see if it can be made to drop to the debugger while it
is under load and otherwise unresponsive.  If you can, getting a process
dump might help confirm where each CPU is spending its time.


Yes, I will see what I can do over the weekend. I have some changes 
to babysit again tomorrow night and will see what I can do between cycles.


---Mike 


___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Proposed 6.2 em RELEASE patch

2006-11-11 Thread Mike Tancsa

At 01:42 AM 11/11/2006, Scott Long wrote:
driver.  What will help me is if you can hook up a serial console to

your machine and see if it can be made to drop to the debugger while it
is under load and otherwise unresponsive.  If you can, getting a process
dump might help confirm where each CPU is spending its time.



./netblast 192.168.88.218 500 110 1000

I compiled in the various debugging options and on the serial console 
I get a few


Expensive timeout(9) function: 0xc0601e48(0) 0.024135749 s
and the serial console


telnet> send break
Expensive timeout(9) function: 0xc0561444(0xc63f1d80) 0.072485748 s
KDB: enter: Line break on console
[thread pid 27 tid 100017 ]
Stopped at  kdb_enter+0x2b: nop
db>

db> ps
  pid  ppid  pgrp   uid   state   wmesg wchancmd
 1206  1123  1206 0  R+  ifstat
 1155  1154  1155 0  R+  csh
 1154 1  1154 0  Ss+ wait 0xc6722218 login
 1123  1122  1123 0  S+  pause0xc6722894 csh
 1122  1117  1122  1002  S+  wait 0xc6739430 su
 1117  1116  1117  1002  Ss+ pause0xc6aa024c csh
 1116  1114  1114  1002  R   sshd
 1114  1028  1114 0  Ss  sbwait   0xc6893370 sshd
 1112 1  1112 0  Ss+ ttyin0xc65ba810 getty
  1   0  Ss+ ttyin0xc65bac10 getty
 1110 1  1110 0  Ss+ ttyin0xc65bb010 getty
 1109 1  1109 0  Ss+ ttyin0xc65bc410 getty
 1108 1  1108 0  Ss+ ttyin0xc65b4010 getty
 1107 1  1107 0  Ss+ ttyin0xc65bd010 getty
 1106 1  1106 0  Ss+ ttyin0xc65bcc10 getty
 1105 1  1105 0  Ss+ ttyin0xc65b2010 getty
 1044 1  1044 0  Ss  nanslp   0xc076ecac cron
 1038 1  103825  Ss  pause0xc6ab2aac sendmail
 1034 1  1034 0  Rs  sendmail
 1028 1  1028 0  Ss  select   0xc07bc004 sshd
  898 1   898 0  Ss  bo_wwait 0xc6ac9130 syslogd
  846 1   846 0  Ss  select   0xc07bc004 devd
  445 1   44565  Ss  select   0xc07bc004 dhclient
  425 143 0  S+  select   0xc07bc004 dhclient
  124 1   124 0  Ss  pause0xc6739034 adjkerntz
   42 0 0 0  SL  -0xe8ff9d04 [schedcpu]
   41 0 0 0  SL  sdflush  0xc07c50f4 [softdepflush]
   40 0 0 0  RL  [syncer]
   39 0 0 0  RL  [vnlru]
   38 0 0 0  SL  psleep   0xc07bc56c [bufdaemon]
   37 0 0 0  SL  pgzero   0xc07c6064 [pagezero]
   36 0 0 0  SL  psleep   0xc07c5bb4 [vmdaemon]
   35 0 0 0  SL  psleep   0xc07c5b70 [pagedaemon]
   34 0 0 0  WL  [irq1: atkbd0]
   33 0 0 0  WL  [irq7: ppc0]
   32 0 0 0  WL  [swi0: sio]
   31 0 0 0  RL  [acpi_cooling0]
   30 0 0 0  SL  tzpoll   0xc08cd838 [acpi_thermal]
   29 0 0 0  WL  [irq19: bge1]
   28 0 0 0  WL  [irq16: bge0+]
   27 0 0 0  RL  CPU 1   [irq18: em1]
   26 0 0 0  WL  [irq17: em0]
   25 0 0 0  WL  [irq23: nve0]
   24 0 0 0  WL  [irq22: atapci2]
   23 0 0 0  WL  [irq21: atapci1]
   22 0 0 0  WL  [irq15: ata1]
   21 0 0 0  WL  [irq14: ata0]
   20 0 0 0  WL  [irq9: acpi0]
9 0 0 0  SL  -0xc645f080 [kqueue taskq]
   19 0 0 0  WL  [swi2: cambio]
8 0 0 0  SL  -0xc645f280 [acpi_task_2]
7 0 0 0  SL  -0xc645f280 [acpi_task_1]
6 0 0 0  SL  -0xc645f280 [acpi_task_0]
   18 0 0 0  WL  [swi5: +]
5 0 0 0  SL  -0xc645f400 [thread taskq]
   17 0 0 0  WL  [swi6: Giant taskq]
   16 0 0 0  WL  [swi6: task queue]
   15 0 0 0  RL  [yarrow]
4 0 0 0  RL  [g_down]
3 0 0 0  RL  [g_up]
2 0 0 0  RL  [g_event]
   14 0 0 0  WL  [swi3: vm]
   13 0 0 0  RL  CPU 0   [swi4: clock sio]
   12 0 0 0  WL  [swi1: net]
   11 0 0 0  RL  [idle: cpu0]
   10 0 0 0  RL  [idle: cpu1]
1 0 1