Re: Network stack changes

Joe Holden Tue, 24 Sep 2013 01:48:49 -0700

On 24/09/2013 08:58, Marko Zec wrote:

On Tuesday 24 September 2013 00:46:46 Sami Halabi wrote:

Hi,

http://info.iet.unipi.it/~**luigi/papers/20120601-dxr.pdf<http://info.i
et.unipi.it/~luigi/papers/20120601-dxr.pdf>
http://www.nxlab.fer.hr/dxr/**stable_8_20120824.diff<http://www.nxlab.f
er.hr/dxr/stable_8_20120824.diff>


I've tried the diff in 10-current, applied cleanly but had errors
compiling new kernel... is there any work to make it work? i'd love to
test it.


Even if you'd make it compile on current, you could only run synthetic tests
measuring lookup performance using streams of random keys, as outlined in
the paper (btw. the paper at Luigi's site is an older draft, the final
version with slightly revised benchmarks is available here:
http://www.sigcomm.org/sites/default/files/ccr/papers/2012/October/2378956-2378961.pdf)

I.e. the code only hooks into the routing API for testing purposes, but is
completely disconnected from the forwarding path.

aha!  How much work would it be to enable it to be used?

We have a prototype in the works which combines DXR with Netmap in userspace
and is capable of sustaining well above line rate forwarding with
full-sized BGP views using Intel 10G cards on commodity multicore machines.
The work was somewhat stalled during the summer but I plan to wrap it up
and release the code until the end of this year.  With recent advances in
netmap it might also be feasible to merge DXR and netmap entirely inside
the kernel but I've not explored that path yet...

mmm, forwarding using netmap would be pretty awesome...

Marko

Sami


On Sun, Sep 22, 2013 at 11:12 PM, Alexander V. Chernikov <

[email protected]> wrote:

On 29.08.2013 15:49, Adrian Chadd wrote:

Hi,


Hello Adrian!
I'm very sorry for the looong reply.

There's a lot of good stuff to review here, thanks!

Yes, the ixgbe RX lock needs to die in a fire. It's kinda pointless to
keep locking things like that on a per-packet basis. We should be able
to do this in a cleaner way - we can defer RX into a CPU pinned
taskqueue and convert the interrupt handler to a fast handler that
just schedules that taskqueue. We can ignore the ithread entirely
here.

What do you think?


Well, it sounds good :) But performance numbers and Jack opinion is
more important :)

Are you going to Malta?

Totally pie in the sky handwaving at this point:

* create an array of mbuf pointers for completed mbufs;
* populate the mbuf array;
* pass the array up to ether_demux().

For vlan handling, it may end up populating its own list of mbufs to
push up to ether_demux(). So maybe we should extend the API to have a
bitmap of packets to actually handle from the array, so we can pass up
a larger array of mbufs, note which ones are for the destination and
then the upcall can mark which frames its consumed.

I specifically wonder how much work/benefit we may see by doing:

* batching packets into lists so various steps can batch process
things rather than run to completion;
* batching the processing of a list of frames under a single lock
instance - eg, if the forwarding code could do the forwarding lookup
for 'n' packets under a single lock, then pass that list of frames up
to inet_pfil_hook() to do the work under one lock, etc, etc.


I'm thinking the same way, but we're stuck with 'forwarding lookup' due
to problem with egress interface pointer, as I mention earlier. However
it is interesting to see how much it helps, regardless of locking.

Currently I'm thinking that we should try to change radix to something
different (it seems that it can be checked fast) and see what happened.
Luigi's performance numbers for our radix are too awful, and there is a
patch implementing alternative trie:
http://info.iet.unipi.it/~**luigi/papers/20120601-dxr.pdf<http://info.i
et.unipi.it/~luigi/papers/20120601-dxr.pdf>
http://www.nxlab.fer.hr/dxr/**stable_8_20120824.diff<http://www.nxlab.f
er.hr/dxr/stable_8_20120824.diff>

Here, the processing would look less like "grab lock and process to
completion" and more like "mark and sweep" - ie, we have a list of
frames that we mark as needing processing and mark as having been
processed at each layer, so we know where to next dispatch them.

I still have some tool coding to do with PMC before I even think about
tinkering with this as I'd like to measure stuff like per-packet
latency as well as top-level processing overhead (ie,
CPU_CLK_UNHALTED.THREAD_P / lagg0 TX bytes/pkts, RX bytes/pkts, NIC
interrupts on that core, etc.)


That will be great to see!

Thanks,



-adrian


______________________________**_________________
[email protected] mailing list
http://lists.freebsd.org/**mailman/listinfo/freebsd-net<http://lists.fr
eebsd.org/mailman/listinfo/freebsd-net> To unsubscribe, send any mail to
"freebsd-net-unsubscribe@**freebsd.org<freebsd-net-unsubscribe@freebsd.
org> "


--
Sami Halabi
Information Systems Engineer
NMS Projects Expert
FreeBSD SysAdmin Expert


On Sun, Sep 22, 2013 at 11:12 PM, Alexander V. Chernikov <

[email protected]> wrote:

On 29.08.2013 15:49, Adrian Chadd wrote:

Hi,


Hello Adrian!
I'm very sorry for the looong reply.

There's a lot of good stuff to review here, thanks!

Yes, the ixgbe RX lock needs to die in a fire. It's kinda pointless to
keep locking things like that on a per-packet basis. We should be able
to do this in a cleaner way - we can defer RX into a CPU pinned
taskqueue and convert the interrupt handler to a fast handler that
just schedules that taskqueue. We can ignore the ithread entirely
here.

What do you think?


Well, it sounds good :) But performance numbers and Jack opinion is
more important :)

Are you going to Malta?

Totally pie in the sky handwaving at this point:

* create an array of mbuf pointers for completed mbufs;
* populate the mbuf array;
* pass the array up to ether_demux().

For vlan handling, it may end up populating its own list of mbufs to
push up to ether_demux(). So maybe we should extend the API to have a
bitmap of packets to actually handle from the array, so we can pass up
a larger array of mbufs, note which ones are for the destination and
then the upcall can mark which frames its consumed.

I specifically wonder how much work/benefit we may see by doing:

* batching packets into lists so various steps can batch process
things rather than run to completion;
* batching the processing of a list of frames under a single lock
instance - eg, if the forwarding code could do the forwarding lookup
for 'n' packets under a single lock, then pass that list of frames up
to inet_pfil_hook() to do the work under one lock, etc, etc.


I'm thinking the same way, but we're stuck with 'forwarding lookup' due
to problem with egress interface pointer, as I mention earlier. However
it is interesting to see how much it helps, regardless of locking.

Currently I'm thinking that we should try to change radix to something
different (it seems that it can be checked fast) and see what happened.
Luigi's performance numbers for our radix are too awful, and there is a
patch implementing alternative trie:
http://info.iet.unipi.it/~**luigi/papers/20120601-dxr.pdf<http://info.i
et.unipi.it/~luigi/papers/20120601-dxr.pdf>
http://www.nxlab.fer.hr/dxr/**stable_8_20120824.diff<http://www.nxlab.f
er.hr/dxr/stable_8_20120824.diff>

Here, the processing would look less like "grab lock and process to
completion" and more like "mark and sweep" - ie, we have a list of
frames that we mark as needing processing and mark as having been
processed at each layer, so we know where to next dispatch them.

I still have some tool coding to do with PMC before I even think about
tinkering with this as I'd like to measure stuff like per-packet
latency as well as top-level processing overhead (ie,
CPU_CLK_UNHALTED.THREAD_P / lagg0 TX bytes/pkts, RX bytes/pkts, NIC
interrupts on that core, etc.)


That will be great to see!

Thanks,



-adrian


_______________________________________________
[email protected] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[email protected]"

Re: Network stack changes

Reply via email to