Re: Netchannles: first stage has been completed. Further ideas.

Rusty Russell Thu, 27 Jul 2006 22:54:19 -0700

On Wed, 2006-07-26 at 23:00 -0700, David Miller wrote:
> From: Rusty Russell <[EMAIL PROTECTED]>
> Date: Thu, 27 Jul 2006 15:46:12 +1000
> 
> > Yes, my first thought back in January was how netfilter would interact
> > with this in a sane way.  One answer is "don't": once someone registers
> > on any hook we go into slow path.  Another is to run the hooks in socket
> > context, which is better, but precludes having the consumer in
> > userspace, which still appeals to me 8)
> 
> Small steps, small steps.  I have not ruled out userspace TCP just
> yet, but we are not prepared to go there right now anyways.  It is
> just the same kind of jump to go to kernel level netchannels as it is
> to go from kernel level netchannels to userspace netchannel based TCP.


I think I was unclear; the possibility of userspace netchannels adds
weight to the idea that we should rework netfilter hooks sooner rather
than later.

> > What would the tuple look like?  Off the top of my head:
> > SRCIP/DSTIP/PROTO/SPT/DPT/IN/OUT (where IN and OUT are boolean values
> > indicating whether the src/dest is local).
> > 
> > Of course, it means rewriting all the userspace tools, documentation,
> > and creating a complete new infrastructure for connection tracking and
> > NAT, but if that's what's required, then so be it.
> 
> I think we are able to finally talk seriously about revamping
> netfilter on this level because we finally have a good incentive to do
> so and some kind of model exists to work against.  Robert's trie might
> be able to handle your tuple very well, fwiw, perhaps even with
> prefixing.
> 
> But something occurs to me.  Socket has ID when it is created and
> goes to established state.  This means we have this tuple, and thus
> we can prelookup the netfilter rule and attach this cached lookup
> state on the socket.  Your tuple in this case is defined to be:
> 
>       SRCIP/DSTIP/"TCP"/SPT/DPT/0/1
> 
> I do not know how practical this is, it is just some suggestion.
> 
> Would there be prefixing in these tuples?  That's where the trouble
> starts.  If you add prefixing, troubles and limitations of lookup of
> today reappear.  If you disallow prefixing, tables get very large
> but lookup becomes simpler and practical.

OK.  AFAICT, there are three ideas in play here (ignoring netchannels).
First, there should be a unified lookup for efficiency (Grand Unified
Cache).  Secondly, that netfilter hook users need to publish information
about what they are actually looking at if they are to use this lookup.
Thirdly, that smart cards can accelerate lookup.

(1) I am imagining some Grand Unified Flow Cache (Olsson trie?) that
holds (some subset of?) flows.  A successful lookup immediately after
packet comes off NIC gives destiny for packet: what route, (optionally)
what socket, what filtering, what connection tracking (& what NAT), etc?
I don't know if this should be a general array of fn & data ptrs, or
specialized fields for each one, or a mix.  Maybe there's a "too hard,
do slow path" bit, or maybe hard cases just never get put in the cache.
Perhaps we need a separate one for locally-generated packets, a-la
ip_route_output().  Anyway, we trade slightly more expensive flow setup
for faster packet processing within flows.

(2) To make this work sanely in the presence of netfilter hooks, we need
them to register the tuples they are interested in.  Not at the hook
level, but *in addition*.  For example, we need to know what flows each
packet filtering rule cares about.  Connection tracking wants to see the
first packet (and first reply packet), but then probably only want to
see packets with RST/SYN/FIN set.  (Erk, window tracking wants to see
every packet, but maybe we could do something).  NAT definitely needs to
see every packet on a connection which is natted.

One way to do this is to add a "have_interest" callback into the
hook_ops, which takes each about-to-be-inserted GUFC entry and adds any
destinies this hook cares about.  In the case of packet filtering this
would do a traversal and append a fn/data ptr to the entry for each rule
which could effect it.  

The other way is to have the hooks register what they are interested in
into a general data structure which GUFC entry creation then looks up
itself.  This general data structure will need to support wildcards
though.

We also need efficient ways of reflecting rule changes into the GUFC.
We can be pretty slack with conntrack timeouts, but we either need to
flush or handle callbacks from GUFC on timed-out entries.  Packet
filtering changes need to be synchronous, definitely.

(3) Smart NICs that do some flowid work themselves can accelerate lookup
implicitly (same flow goes to same CPU/thread) or explicitly (each
CPU/thread maintains only part of GUFC which it needs, or even NIC
returns flow cookie which is pointer to GUFC entry or subtree?).  AFAICT
this will magnify the payoff from the GUFC.

Sorry for the length,
Rusty.
-- 
Help! Save Australia from the worst of the DMCA: http://linux.org.au/law

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Netchannles: first stage has been completed. Further ideas.

Reply via email to