Robert Watson wrote:
Experience suggests that forwarding workloads see significant lock
contention in the routing and transmit queue code. The former needs
some kernel hacking to address in order to improve parallelism for
routing lookups. The latter is harder to address given the hardware
y
On Mon, 7 Jul 2008, Andre Oppermann wrote:
Robert Watson wrote:
Experience suggests that forwarding workloads see significant lock
contention in the routing and transmit queue code. The former needs some
kernel hacking to address in order to improve parallelism for routing
lookups. The lat
Ingo Flaschberger wrote:
Dear Paul,
I tried all of this :/ still, 256/512 descriptors seem to work the best.
Happy to let you log into the machine and fiddle around if you want :)
yes, but I'm shure I will also not be able to achieve much more pps.
As it seems that you hit hardware-software-
Robert Watson wrote:
On Mon, 7 Jul 2008, Andre Oppermann wrote:
Robert Watson wrote:
Experience suggests that forwarding workloads see significant lock
contention in the routing and transmit queue code. The former needs
some kernel hacking to address in order to improve parallelism for
rou
Paul,
to get a systematic analysis of the performance please do the following
tests and put them into a table for easy comparison:
1. inbound pps w/o loss with interface in monitor mode (ifconfig em0 monitor)
2. inbound pps w/ fastforward into a single blackhole route
3. inbound pps /w fast
Paul wrote:
SMP DISABLED on my Opteron 2212 (ULE, Preemption on)
Yields ~750kpps in em0 and out em1 (one direction)
I am miffed why this yields more pps than
a) with all 4 cpus running and b) 4 cpus with lagg load balanced over 3
incoming connections so 3 taskq threads
SMP adds quite some ov
On Mon, 7 Jul 2008, Andre Oppermann wrote:
Ingo Flaschberger wrote:
I don't think you will be able to route 64byte packets at 1gbit wirespeed
(2Mpps) with a current x86 platform.
You have to take inter-frame gap and other overheads too. That gives
about 1.244Mpps max on a 1GigE interface.
On Mon, 7 Jul 2008, Andre Oppermann wrote:
Distributing the interrupts and taskqueues among the available CPUs gives
concurrent forwarding with bi- or multi-directional traffic. All incoming
traffic from any particular interface is still serialized though.
... although not on multiple input
Robert Watson wrote:
On Mon, 7 Jul 2008, Andre Oppermann wrote:
Distributing the interrupts and taskqueues among the available CPUs
gives concurrent forwarding with bi- or multi-directional traffic. All
incoming traffic from any particular interface is still serialized
though.
... although
Bruce Evans wrote:
On Mon, 7 Jul 2008, Andre Oppermann wrote:
Ingo Flaschberger wrote:
I don't think you will be able to route 64byte packets at 1gbit
wirespeed (2Mpps) with a current x86 platform.
You have to take inter-frame gap and other overheads too. That gives
about 1.244Mpps max on a
Andre Oppermann wrote:
Robert Watson wrote:
Experience suggests that forwarding workloads see significant lock
contention in the routing and transmit queue code. The former needs
some kernel hacking to address in order to improve parallelism for
routing lookups. The latter is harder to addre
On Mon, 7 Jul 2008, Andre Oppermann wrote:
Bruce Evans wrote:
What are the other overheads? I calculate 1.644Mpps counting the
inter-frame
gap, with 64-byte packets and 64-header_size payloads. If the 64 bytes
is for the payload, then the max is much lower.
The theoretical maximum at 64byt
On Mon, 7 Jul 2008, Bruce Evans wrote:
I use low-end memory, but on the machine that does 640 kpps it somehow has
latency almost 4 times as low as on new FreeBSD cluster machines (~42 nsec
instead of ~150). perfmon (fixed for AXP and A64) and hwpmc report an
average of 11 k8-dc-misses per se
On Mon, 7 Jul 2008, Robert Watson wrote:
The last of these should really be quite a bit faster than the first of
these, but I'd be interested in seeing specific measurements for each if
that's possible!
And, if you're feeling particualrly subject to suggestion, you might consider
comparing
On Mon, 7 Jul 2008, Robert Watson wrote:
Since you're doing fine-grained performance measurements of a code path that
interests me a lot, could you compare the cost per-send on UDP for the
following four cases:
(1) sendto() to a specific address and port on a socket that has been bound
to
Synopsis: [fxp] fxp(4) driver failed to initialize device Intel 82801DB
State-Changed-From-To: open->feedback
State-Changed-By: gavin
State-Changed-When: Mon Jul 7 13:27:12 UTC 2008
State-Changed-Why:
To submitter: Could you give the putput of "pciconf -l |grep fxp" please?
http://www.freebsd.o
On Mon, Jul 07, 2008 at 10:30:53PM +1000, Bruce Evans wrote:
> On Mon, 7 Jul 2008, Andre Oppermann wrote:
>
> > Bruce Evans wrote:
> >> What are the other overheads? I calculate 1.644Mpps counting the
> >> inter-frame
> >> gap, with 64-byte packets and 64-header_size payloads. If the 64 bytes
>
Bruce Evans wrote:
On Mon, 7 Jul 2008, Andre Oppermann wrote:
Bruce Evans wrote:
What are the other overheads? I calculate 1.644Mpps counting the
inter-frame
gap, with 64-byte packets and 64-header_size payloads. If the 64 bytes
is for the payload, then the max is much lower.
The theoreti
On Mon, 7 Jul 2008, Bruce Evans wrote:
(1) sendto() to a specific address and port on a socket that has been bound
to
INADDR_ANY and a specific port.
(2) sendto() on a specific address and port on a socket that has been bound
to
a specific IP address (not INADDR_ANY) and a specific po
On Mon, 7 Jul 2008, Andre Oppermann wrote:
Paul,
to get a systematic analysis of the performance please do the following
tests and put them into a table for easy comparison:
1. inbound pps w/o loss with interface in monitor mode (ifconfig em0
monitor)
...
I won't be running many of these t
Bruce Evans wrote:
On Mon, 7 Jul 2008, Andre Oppermann wrote:
Paul,
to get a systematic analysis of the performance please do the following
tests and put them into a table for easy comparison:
1. inbound pps w/o loss with interface in monitor mode (ifconfig em0
monitor)
...
I won't be run
Hello List,
I've experienced the following with both a kubuntu and a FBSD7 client and
FBSD7 as server:
When i try to copy a file off a *mounted* CIFS/SMB-share I get transfer rates
below 1 MByte/sec. If i start a second, concurrent transfer i am getting
transfer rates around 8MB/s on *each* tra
On Mon, 7 Jul 2008, Andre Oppermann wrote:
Bruce Evans wrote:
So it seems that the major overheads are not near the driver (as I already
knew), and upper layers are responsible for most of the cache misses.
The packet header is accessed even in monitor mode, so I think most of
the cache misses
The following reply was made to PR kern/123200; it has been noted by GNATS.
From: Alexander Motin <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED], [EMAIL PROTECTED]
Cc:
Subject: Re: kern/123200: [netgraph] Server failure due to netgraph mpd and
dhcpclient
Date: Mon, 07 Jul 2008 21:27:58 +0300
If I
one that will later on handle the taskqueue to process the packets.
That adds overhead. Ideally the interrupt for each network interface
is bound to exactly one pre-determined CPU and the taskqueue is bound
to the same CPU. That way the overhead for interrupt and taskqueue
scheduling can be ke
I use low-end memory, but on the machine that does 640 kpps it somehow
has latency almost 4 times as low as on new FreeBSD cluster machines
(~42 nsec instead of ~150). perfmon (fixed for AXP and A64) and hwpmc
report an average of 11 k8-dc-misses per sendto() while sending via
bge at 640 kpps.
On Tue, 8 Jul 2008, Bruce Evans wrote:
On Mon, 7 Jul 2008, Andre Oppermann wrote:
Bruce Evans wrote:
So it seems that the major overheads are not near the driver (as I already
knew), and upper layers are responsible for most of the cache misses.
The packet header is accessed even in monitor m
Hi,
As was already mentioned, we can't avoid all cache misses as there's
data that's recently been updated in memory via DMA and therefor
kicked out of cache.
However, we may hide some of the latency penalty by prefetching
'interesting' data early. I.e. we know that we want to access some
etherne
At 02:44 PM 7/7/2008, Paul wrote:
Also my 82571 NIC supports multiple received queues and multiple
transmit queues so why hasn't
anyone written the driver to support this? It's not a 10gb card and
it still supports it and it's widely
Intel actually maintains the driver. Not sure if there are
I hope so, if they maintain the driver then why wouldn't they make it
take advantage of their own hardware?
I hope they are stuck focusing on windows users :/
Mike Tancsa wrote:
At 02:44 PM 7/7/2008, Paul wrote:
Also my 82571 NIC supports multiple received queues and multiple
transmit queues
Artem Belevich wrote:
Hi,
As was already mentioned, we can't avoid all cache misses as there's
data that's recently been updated in memory via DMA and therefor
kicked out of cache.
However, we may hide some of the latency penalty by prefetching
'interesting' data early. I.e. we know that we wan
On 2008-Jul-07 13:25:13 -0700, Julian Elischer <[EMAIL PROTECTED]> wrote:
>what you need is a speculative prefetch where you an tell teh
>processor "We will probably need the following address so start
>getting it while we go do other stuff".
This looks like the PREFETCH instructions that exist
Peter Jeremy wrote:
On 2008-Jul-07 13:25:13 -0700, Julian Elischer <[EMAIL PROTECTED]> wrote:
what you need is a speculative prefetch where you an tell teh
processor "We will probably need the following address so start
getting it while we go do other stuff".
This looks like the PREFETCH inst
> Prefetching when you are waiting for the data isn't a help.
Agreed. Got to start prefetch around ns
before you actually need the data and move on doing other things that
do not depend on the data you've just started prefetching.
> what you need is a speculative prefetch where you an tell teh
We could add this as a part of the fastforwarding code and for a router
turn it on and for a server leave it off.
When I use a FBSD box for a router, it doesn't do anything else, so
there could be two optimized paths that is one for
routing/forwarding/firewalling only
and one for use as a serv
Mike Tancsa wrote:
At 02:44 PM 7/7/2008, Paul wrote:
Also my 82571 NIC supports multiple received queues and multiple
transmit queues so why hasn't
anyone written the driver to support this? It's not a 10gb card and
it still supports it and it's widely
Intel actually maintains the driver. Not
Artem Belevich wrote:
Prefetching when you are waiting for the data isn't a help.
Agreed. Got to start prefetch around ns
before you actually need the data and move on doing other things that
do not depend on the data you've just started prefetching.
what you need is a speculative prefetch
At 02:44 PM 7/7/2008, Paul wrote:
Also my 82571 NIC supports multiple received queues and multiple
transmit queues so why hasn't
anyone written the driver to support this? It's not a 10gb card and
it still supports it and it's widely
available and not too expensive either. The new 82575/6 ch
I read through the IGB driver, and it says 82575/6 only... which is the
new chip Intel is releasing on the cards this month 2 port
and october 4 port, but the chips are on some of the motherboards right now.
Why can't it also use the 82571 ? doesn't make any sense.. I haven't
tried it but just
On 2008-Jul-07 16:15:40 +, Achim <[EMAIL PROTECTED]> wrote:
>Performance with a single client is degraded when the client is smbmount and
>downloading.
>With a second transfer in any direction, performance becomes better, to about
>3.5 resp. 8 MB/s depending on the second connection up- or d
40 matches
Mail list logo