On 18.08.2013 23:54, Adrian Chadd wrote:
Hi,
I think the "UNIX architecture" is a bit broken for anything other than the
occasional (for various traffic levels defining "occasional!") traffic
connection. It's serving us well purely through the sheer force of will of
modern CPU power but I think we can do a lot better.
I do not agree with you here. The UNIX architecture is fine but of course
as with anything you're not going to get the full raw and theoretically
possible performance for every special case out of it. It is extremely
versatile and performs rather good over a broad set of applications.
_I_ think the correct model is a netmap model - batched packet handling,
lightweight drivers pushing and pulling batches of things, with some
lightweight plugins to service that inside the kernel and/or push into the
netmap ring buffer in userland. Interfacing into the ethernet and socket
layer should be something that bolts on the side, kind of netgraph style.
It would likely look a lot more like a switching backplane with socket IO
being one of many processing possibilities. If socket IO stays packet at a
time than great; but that's messing up the ability to do a lot of other
interesting things.
Not really. While netmap is really good at pushing packets (on x86 cache
coherent architectures only I may add) it fails miserably as general "socket"
layer.
On the receive side it has a fixed buffer pool and would grind to a halt if
you were directly using those for TCP receive socket buffers if the tcp
application stops immediately processing every packet. This means you have
to copy the packet contents from the NIC DMA pool to some other allocated
memory to prevent that.
It doesn't have any security model and isn't really multi-app aware. How
do you multiplex a number of protocols and connections to different
applications? Copy through shared memory?
You'd have to re-implement the entire protocol stack starting with ethernet,
IPv4 and IPv6 up to UDP and TCP. The latter being rather complex. For send
you need a routing table and ARP to be managed.
For data sending you run into the same buffer problem as with receive. TCP
has to hold on to the data sent until it is acknowledged. That's a data copy
again because you can't store it in the NIC DMA pool. Memory pools (mbufs)
then need to be allocated and managed as well, not to mention page fault
issues (userspace).
Once you're through all that you end up with the UNIX style kernel stack
moved into a userspace library.
That's why I'm (more) interested in what you've done architecture wise than
just saying "dump it in userland and be done with it." I think the VALE
kernel stuff is very interesting from an architectural perspective. The
questions (to me!) are:
* how do we implement this in the current framework? (That's not too scary
though; we'd just have the existing ethernet input/output path be one of
many processing modules, and VALE would be another; netmap-userland would
be another; etc, etc);
* how do we make it a compile time fallback to the traditional model, for
platforms that continue to be memory and/or cache constrained? (read:
everything that's embedded)
* ... and not simply have lots of #Ifdef NETMAP everywhere, but make the
fallback be something sane and fall out of the API design?
Netmap really excels at pushing packets. I think a recent extension allows
a netmap process to push a netmap-received packet back into the kernel for
further processing. That's a good hybrid model for those use cases that
need raw packet pushing speed and have only little local traffic.
I'll try to rope some more ideas into that design at the cambridge and euro
BSD developer summits. I'll try to post some kind of work roadmap to the
list(s) for comments and potential code hacking.
Anyway. I'll continue waving hands and hacking on code until I have
something that works.
Rather than day-dreaming of shiny new things we should invest in making the
kernel better and fix/remove bottlenecks. It's a good kernel.
Luigi - when are you next at a BSD developer summit / conference? Will you
be at Malta?
He has submitted a talk about netmap and was accepted, so I surely do hope
that he shows up. ;)
--
Andre
_______________________________________________
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"