Re: RDMA will be reverted

Andi Kleen Mon, 24 Jul 2006 17:05:23 -0700

On Tuesday 25 July 2006 01:22, David Miller wrote:
> From: Andi Kleen <[EMAIL PROTECTED]>
> Date: Tue, 25 Jul 2006 01:10:25 +0200
> 
> > > All the original costs of route, netfilter, TCP socket lookup all
> > > reappear as we make VJ netchannels fit all the rules of real practical
> > > systems, eliminating their gains entirely.
> > 
> > At least most of the optimizations from the early demux scheme could
> > be probably gotten simpler by adding a fast path to iptables/conntrack/etc. 
> > that checks if all rules only check SYN etc. packets and doesn't walk the
> > full rules then (or more generalized a fast TCP flag mask check similar 
> > to what TCP does). With that ESTABLISHED would hit TCP only with relatively
> > small overhead.
> 
> Actually, all is not lost.  Alexey has a more clever idea which
> is basically to run the netfilter hooks in the socket receive
> path.


The gain being that the target CPU does the work instead of 
the softirq one?

Some combined lookup and better handler of ESTABLISHED still
seems like a good idea.

One idea I had at some point was to separate conntrack for local
connection vs routed connections and attach the local conntrack
to the socket (and use its lookup tables). Then at least for
local connections conntrack should be nearly free.

It should also solve the issue we currently have that enabled 
conntrack makes local network performance significantly worse.

> Where does state live in such a huge process?  Usually, it is
> scattered all over it's address space.  Let us say that java
> application just did a lot of churning on it's own data
> structure, swapping out TCP library state objects, we will
> prematurely swap that stuff back in just to spit out an ACK
> or similar.

TCP state is usually multiple cache lines, so you would have
cache misses anyways. Do you worry about the TLBs? 

> > But what do you do when you have lots of different connections
> > with different target CPU hash values or when this would require
> > you to move multiple compute intensive processes or a single core?
> 
> That is why we have scheduler :)

It can't do well if it gets conflicting input.

> Even in a best effort scenerio, things 
> will be generally better than they are currently, plus there is nothing
> precluding the flow demux MSI-X selection from getting more intelligent.

Intelligent = statefull in this case.

AFAIK the only way to do it stateless is hashes and the output
of hashes tends to be unpredictible by definition.


> For example, the demuxer could "notice" that TCPdata transmits for
> flow X tend to happen on cpu X, and update a flow table to record that
> fact.  It could use the same flow table as the one used for LRO.

Hmm, i somewhat doubt that lower end NICs will ever have such flow tables.
Also the flow tables could always thrash (because the on NIC RAM is necessarily
limited) or they or require the NIC to look up state in memory which is 
probably not much faster than the CPUs doing it.

Using hash functions in the hardware to select the MSI-X seems 
more elegant, cheaper and much more scalable to me.

The drawback of hashes is that for processes with multiple
connections you have to move some work back into the softirqs
that run on the MSI-X target CPUs.

So basically doing process context TCP fully will require
much more complex and statefull hardware. 

Or you can keep it only as a fast path for specific situations
(single busy connection per thread) and stay with mostly-softirq
processing for the many connection cases.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: RDMA will be reverted

Reply via email to