Ok, now THIS is absoultely a whole bunch of ridiculousness..
I set up etherchannel, and I'm evenly distributing packets over em0 em1 and em2 to lagg0 and i get WORSE performance than with a single interface.. Can anyone explain this one? This is horrible. I got em0-em2 taskq's using 80% cpu EACH and they are only doing 100kpps EACH

looks:

packets  errs      bytes    packets  errs      bytes colls
   105050 11066    6303000          0     0          0     0
   104952 13969    6297120          0     0          0     0
   104331 12121    6259860          0     0          0     0

          input          (em1)           output
  packets  errs      bytes    packets  errs      bytes colls
   103734 70658    6223998          0     0          0     0
   103483 75703    6209046          0     0          0     0
   103848 76195    6230886          0     0          0     0


          input          (em2)           output
  packets  errs      bytes    packets  errs      bytes colls
   103299 62957    6197940          1     0        226     0
   106388 73071    6383280          1     0        178     0
   104503 70573    6270180          4     0        712     0

last pid: 1378; load averages: 2.31, 1.28, 0.57 up 0+00:06:27 17:42:32
68 processes:  8 running, 42 sleeping, 18 waiting
CPU:  0.0% user,  0.0% nice, 58.9% system,  0.0% interrupt, 41.1% idle
Mem: 7980K Active, 5932K Inact, 47M Wired, 16K Cache, 8512K Buf, 1920M Free
Swap: 8192M Total, 8192M Free

 PID USERNAME PRI NICE   SIZE    RES STATE  C   TIME   WCPU COMMAND
  11 root     171 ki31     0K    16K RUN    2   5:18 80.47% idle: cpu2
  38 root     -68    -     0K    16K CPU3   3   2:30 80.18% em2 taskq
  37 root     -68    -     0K    16K CPU1   1   2:28 76.90% em1 taskq
  36 root     -68    -     0K    16K CPU2   2   2:28 72.56% em0 taskq
  13 root     171 ki31     0K    16K RUN    0   3:32 29.20% idle: cpu0
  12 root     171 ki31     0K    16K RUN    1   3:29 27.88% idle: cpu1
  10 root     171 ki31     0K    16K RUN    3   3:21 25.63% idle: cpu3
  39 root     -68    -     0K    16K -      3   0:32 17.68% em3 taskq


See that's total wrongness.. something is very wrong here. Does anyone have any ideas? I really need to get this working. I figured if I evenly distributed the packets over 3 interfaces it simulates having 3 rx queues because it has a separate process for each interface and the result is WAY more CPU usage and a little over half the pps throughput with a single port ..

If anyone is interested in tackling some these issues please e-mail me. It would be greatly appreciated.


Paul



Julian Elischer wrote:
Paul wrote:
ULE without PREEMPTION is now yeilding better results.
        input          (em0)           output
  packets  errs      bytes    packets  errs      bytes colls
   571595 40639   34564108          1     0        226     0
   577892 48865   34941908          1     0        178     0
   545240 84744   32966404          1     0        178     0
   587661 44691   35534512          1     0        178     0
   587839 38073   35544904          1     0        178     0
   587787 43556   35540360          1     0        178     0
   540786 39492   32712746          1     0        178     0
   572071 55797   34595650          1     0        178     0
*OUCH, IPFW HURTS.. loading ipfw, and adding one ipfw rule allow ip from any to any drops 100Kpps off :/ what's up with THAT? unloaded ipfw module and back 100kpps more again, that's not right with ONE rule.. :/

ipfw need sto gain a lock on hte firewall before running,
and is quite complex..  I can believe it..

in FreeBSD 4.8 I was able to use ipfw and filter 1Gb between two interfaces (bridged) but I think it has slowed down since then due to the SMP locking.



em0 taskq is still jumping cpus.. is there any way to lock it to one cpu or is this just a function of ULE

running a tar czpvf all.tgz *  and seeing if pps changes..
negligible.. guess scheduler is doing it's job at least..

Hmm. even when it's getting 50-60k errors per second on the interface I can still SCP a file through that interface although it's not fast.. 3-4MB/s..

You know, I wouldn't care if it added 5ms latency to the packets when it was doing 1mpps as long as it didn't drop any.. Why can't it do that? Queue them up and do them in bigggg chunks so none are dropped........hmm?

32 bit system is compiling now.. won't do > 400kpps with GENERIC kernel, as with 64 bit did 450k with GENERIC, although that could be
the difference between opteron 270 and opteron 2212..

Paul

_______________________________________________
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

_______________________________________________
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


_______________________________________________
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Reply via email to