On Fri, 2006-02-03 at 18:48, Andi Kleen wrote:
> On Friday 03 February 2006 02:07, Greg Banks wrote:
>
> > > (Don't ask for code - it's not really in an usable state)
> >
> > Sure. I'm looking forward to it.
>
> I had actually shelved the idea because of TSO. But if you can get me
> some data f
On Friday 03 February 2006 02:07, Greg Banks wrote:
> > (Don't ask for code - it's not really in an usable state)
>
> Sure. I'm looking forward to it.
I had actually shelved the idea because of TSO. But if you can get me
some data from your NFS servers that shows TSO is not enough
for them that
From: Greg Banks <[EMAIL PROTECTED]>
Date: Fri, 03 Feb 2006 12:08:54 +1100
> So, given 2.6.16 on tg3 hardware, would your advice be to
> enable TSO by default?
Yes.
In fact I've been meaning to discuss with Michael Chan
enabling it in the driver by default.
-
To unsubscribe from this list: send
On Fri, 2006-02-03 at 01:41, Leonid Grossman wrote:
>
> As I mentioned earlier, it would be cool to get these moderation
> tresholds from NAPI, since it can make a better guess about the overall
> system utilization than the driver can.
Agreed.
> But even at the driver level,
> this works reas
On Thu, 2006-02-02 at 18:51, David S. Miller wrote:
> From: Greg Banks <[EMAIL PROTECTED]>
> Date: Thu, 02 Feb 2006 18:31:49 +1100
>
> > On Thu, 2006-02-02 at 17:45, Andi Kleen wrote:
> > > Normally TSO was supposed to fix that.
> >
> > Sure, except that the last time SGI looked at TSO it was
>
On Thu, 2006-02-02 at 18:48, Andi Kleen wrote:
> On Thursday 02 February 2006 08:31, Greg Banks wrote:
>
> > [...]SGI's solution is do is ship a script that uses ethtool
> > at boot to tune rx-usecs, rx-frames, rx-usecs-irq, rx-frames-irq
> > up from the defaults.
>
> All user tuning like this is
> -Original Message-
> From: Andi Kleen [mailto:[EMAIL PROTECTED]
> Why are you saying it can't be used by the host? The stack
> should be fully ready for it.
Sorry, I should have said "it can't be used by the host to the full
potential of the feature" :-).
It does work for us now, a
Andi Kleen wrote:
On Thursday 02 February 2006 08:31, Greg Banks wrote:
The tg3 driver uses small hardcoded values for the RXCOL_TICKS
and RXMAX_FRAMES registers, and allows "ethtool -C" to change
them. SGI's solution is do is ship a script that uses ethtool
at boot to tune rx-usecs, rx-frame
Leonid Grossman writes:
> Right. Interrupt moderation is done on per channel basis.
> The only addition to the current NAPI mechanism I'd like to see is to
> have NAPI setting desired interrupt rate (once interrupts are ON),
> rather than use an interrupt per packet or a driver default. Argu
Oh you have TSO disabled? That explains a lot.
Yes, it's been a bumpy road, and there are still some
e1000 lockups, but in general things should be smooth
these days.
I suspect that "these days" in kernel.org terms differs somewhat from "these
days" RH/SuSE/etc terms, hence TSO being disabled
On Wed, 01 Feb 2006 16:29:11 -0800 (PST)
"David S. Miller" <[EMAIL PROTECTED]> wrote:
> From: Stephen Hemminger <[EMAIL PROTECTED]>
> Date: Wed, 1 Feb 2006 16:12:14 -0800
>
> > The bigger problem I see is scalability. All those mmap rings have to
> > be pinned in memory to be useful. It's fine f
On Thursday 02 February 2006 17:27, Leonid Grossman wrote:
> By now we have submitted UFO, MSI-X and LRO patches. The one item on
> the TODO list that we did not submit a full driver patch for is the
> "support for distributing receive processing across multiple CPUs (using
> NIC hw queues)", mai
> -Original Message-
> From: Eric W. Biederman [mailto:[EMAIL PROTECTED]
> How do you classify channels?
Multiple rx steering criterias are available, for example tcp tuple (or
subset) hash, direct tcp tuple (or subset) match, MAC address, pkt size,
vlan tag, QOS bits, etc.
>
> If
Thanks to Andi, Dave, Jeff and everyone who responded to the original
query; I've got enough pointers to presentations, blogs and ideas to
keep me busy for a while :-)
VJ channels indeed seem to compliment and take to a different level some
sw and hw ideas on Dave's TODO list.
By now we have subm
On Thu, 02 Feb 2006 08:35:28 -0700
[EMAIL PROTECTED] (Eric W. Biederman) wrote:
> "Christopher Friesen" <[EMAIL PROTECTED]> writes:
>
> > Eric W. Biederman wrote:
> >> Jeff Garzik <[EMAIL PROTECTED]> writes:
> >
> >>> This was discussed on the netdev list, and the conclusion was that
> >>> you wa
"Leonid Grossman" <[EMAIL PROTECTED]> writes:
> There two facilities (at least, in our ASIC, but there is no reason this
> can't be part of the generic multi-channel driver interface that I will
> get to shortly) to deal with it.
>
> - hardware supports more than one utilization-based interrupt ra
"Christopher Friesen" <[EMAIL PROTECTED]> writes:
> Eric W. Biederman wrote:
>> Jeff Garzik <[EMAIL PROTECTED]> writes:
>
>>> This was discussed on the netdev list, and the conclusion was that
>>> you want both NAPI and hw mitigation. This was implemented in a
>>> few drivers, at least.
>
>> How
> -Original Message-
> From: Andi Kleen [mailto:[EMAIL PROTECTED]
> > You just need to make sure that you don't leak data from
> other peoples
> > sockets.
>
> There are three basic ways I can see to do this:
>
> - You have really advanced hardware which can potentially
> manage
Eric W. Biederman wrote:
Jeff Garzik <[EMAIL PROTECTED]> writes:
This was discussed on the netdev list, and the conclusion was that
you want both NAPI and hw mitigation. This was implemented in a
few drivers, at least.
How does that deal with the latency that hw mitigation introduces.
When
> -Original Message-
> From: Eric W. Biederman [mailto:[EMAIL PROTECTED]
> Sent: Thursday, February 02, 2006 4:29 AM
> To: Jeff Garzik
> Cc: Andi Kleen; Greg Banks; David S. Miller; Leonid Grossman;
> [EMAIL PROTECTED]; Linux Network Development list
> Subject:
Jeff Garzik <[EMAIL PROTECTED]> writes:
> Andi Kleen wrote:
>> There was already talk some time ago to make NAPI drivers use
>> the hardware mitigation again. The reason is when you have
>
>
> This was discussed on the netdev list, and the conclusion was that you want
> both
> NAPI and hw mitigat
From: Greg Banks <[EMAIL PROTECTED]>
Date: Thu, 02 Feb 2006 18:31:49 +1100
> On Thu, 2006-02-02 at 17:45, Andi Kleen wrote:
> > Normally TSO was supposed to fix that.
>
> Sure, except that the last time SGI looked at TSO it was
> extremely flaky. I gather that's much better now, but TSO
> still
On Thursday 02 February 2006 08:31, Greg Banks wrote:
> The tg3 driver uses small hardcoded values for the RXCOL_TICKS
> and RXMAX_FRAMES registers, and allows "ethtool -C" to change
> them. SGI's solution is do is ship a script that uses ethtool
> at boot to tune rx-usecs, rx-frames, rx-usecs-ir
On Thursday 02 February 2006 00:50, David S. Miller wrote:
>
> Why not concentrate your thinking on how to make it can be made to
> _work_ instead of punching holes in the idea? Isn't that more
> productive?
What I think would be very practical to do would be to try to
replace the socket rx que
On Thursday 02 February 2006 00:08, Jeff Garzik wrote:
> Definitely not. POSIX AIO is far more complex than the operation
> requires,
Ah, I sense strong a NIH field.
> and is particularly bad for implementations that find it wise
> to queue a bunch of to-be-filled buffers.
Why? lio_listio se
On Thu, 2006-02-02 at 17:45, Andi Kleen wrote:
> There was already talk some time ago to make NAPI drivers use
> the hardware mitigation again. The reason is when you have
> a workload that runs below overload and doesn't quite
> fill the queues and is a bit bursty, then NAPI tends to turn
> on/o
Andi Kleen wrote:
There was already talk some time ago to make NAPI drivers use
the hardware mitigation again. The reason is when you have
This was discussed on the netdev list, and the conclusion was that you
want both NAPI and hw mitigation. This was implemented in a few
drivers, at least
On Thursday 02 February 2006 07:49, David S. Miller wrote:
> From: Andi Kleen <[EMAIL PROTECTED]>
> Date: Thu, 2 Feb 2006 07:45:26 +0100
>
> > Don't think it was ever implemented though. In the end we just
> > eat the slowdown in that particular load.
>
> The tg3 driver uses the chip interrupt miti
On Thursday 02 February 2006 00:37, Mitchell Blank Jr wrote:
> Jeff Garzik wrote:
> > Once packets classified to be delivered to a specific local host socket,
> > what further operations are require privs? What received packet data
> > cannot be exposed to userspace?
>
> You just need to make sure
> -Original Message-
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On Behalf Of Andi Kleen
> Sent: Wednesday, February 01, 2006 10:45 PM
> There was already talk some time ago to make NAPI drivers use
> the hardware mitigation again. The reason is when you have a
> workload
On Thursday 02 February 2006 04:19, Greg Banks wrote:
> On Thu, 2006-02-02 at 14:13, David S. Miller wrote:
> > From: Greg Banks <[EMAIL PROTECTED]>
> > Date: Thu, 02 Feb 2006 14:06:06 +1100
> >
> > > On Thu, 2006-02-02 at 13:46, David S. Miller wrote:
> > > > I know SAMBA is using sendfile() (when
From: Andi Kleen <[EMAIL PROTECTED]>
Date: Thu, 2 Feb 2006 07:45:26 +0100
> Don't think it was ever implemented though. In the end we just
> eat the slowdown in that particular load.
The tg3 driver uses the chip interrupt mitigation to help
deal with the SGI NUMA issues resulting from NAPI.
-
To
On Thursday 02 February 2006 02:53, Greg Banks wrote:
> On Thu, 2006-02-02 at 08:11, David S. Miller wrote:
> > Van is not against NAPI, in fact he's taking NAPI to the next level.
> > Softirq handling is overhead, and as this work shows, it is totally
> > unnecessary overhead.
>
> I got the impres
On Thu, 2006-02-02 at 14:32, David S. Miller wrote:
> I see.
>
> Maybe we can be smarter about how the write(), CORK, sendfile,
> UNCORK sequence is done.
>From the NFS server's point of view, the ideal interface would
be to pass an array of {page,offset,len} tuples, covering up to
around 1 MiB+1
From: Greg Banks <[EMAIL PROTECTED]>
Date: Thu, 02 Feb 2006 14:19:43 +1100
> Multiple trips down through TCP, qdisc, and the driver for each
> NFS packet sent: one for the header and one for each page. Lots
> of locks need to be taken and dropped, all this while multiple nfds
> on multiple CPUs a
On Thu, 2006-02-02 at 14:13, David S. Miller wrote:
> From: Greg Banks <[EMAIL PROTECTED]>
> Date: Thu, 02 Feb 2006 14:06:06 +1100
>
> > On Thu, 2006-02-02 at 13:46, David S. Miller wrote:
> > > I know SAMBA is using sendfile() (when the client has the oplock held,
> > > which basically is "always
From: Greg Banks <[EMAIL PROTECTED]>
Date: Thu, 02 Feb 2006 14:06:06 +1100
> On Thu, 2006-02-02 at 13:46, David S. Miller wrote:
> > I know SAMBA is using sendfile() (when the client has the oplock held,
> > which basically is "always"), is NFS doing so as well?
>
> NFS is an in-kernel server, an
On Thu, 2006-02-02 at 13:46, David S. Miller wrote:
> I know SAMBA is using sendfile() (when the client has the oplock held,
> which basically is "always"), is NFS doing so as well?
NFS is an in-kernel server, and uses sock->ops->sendpage directly.
> Van does have some ideas in mind for TX net ch
From: Greg Banks <[EMAIL PROTECTED]>
Date: Thu, 02 Feb 2006 12:53:14 +1100
> I got the impression that his code was dynamically changing the
> e1000 interrupt mitigation registers in response to load, in
> other words using the capabilities of the hardware in a way that
> NAPI deliberately avoids
David S. Miller wrote:
From: Rick Jones <[EMAIL PROTECTED]>
Date: Wed, 01 Feb 2006 17:32:24 -0800
How large is "the bulk?"
The prequeue is always enabled when the app has blocked
on read().
Actually I meant in terms of percentage of the cycles to process the packet
rather than frequency
On Thu, 2006-02-02 at 08:11, David S. Miller wrote:
> Van is not against NAPI, in fact he's taking NAPI to the next level.
> Softirq handling is overhead, and as this work shows, it is totally
> unnecessary overhead.
I got the impression that his code was dynamically changing the
e1000 interrupt m
From: Rick Jones <[EMAIL PROTECTED]>
Date: Wed, 01 Feb 2006 17:32:24 -0800
> How large is "the bulk?"
The prequeue is always enabled when the app has blocked
on read().
> > Ie. ACK goes out as fast as we can context switch
> >to the app receiving the data. This feedback makes all senders
>
Maybe I'm not sufficiently clued-in, but in broad handwaving terms,
it seems today that all three can be taking place in parallel for a
given TCP connection. The application is doing its
application-level thing on request N on one CPU, while request N+1
is being processed by TCP on another CPU, w
From: Rick Jones <[EMAIL PROTECTED]>
Date: Wed, 01 Feb 2006 16:39:00 -0800
> My questions are meant to see if something is even a roadblock in
> the first place.
Fair enough.
> Maybe I'm not sufficiently clued-in, but in broad handwaving terms,
> it seems today that all three can be taking place
David S. Miller wrote:
From: Rick Jones <[EMAIL PROTECTED]>
Date: Wed, 01 Feb 2006 15:50:38 -0800
[ What sucks about this whole thread is that only folks like
Jeff and myself are attempting to think and use our imagination
to consider how some roadblocks might be overcome ]
My question
From: Stephen Hemminger <[EMAIL PROTECTED]>
Date: Wed, 1 Feb 2006 16:12:14 -0800
> The bigger problem I see is scalability. All those mmap rings have to
> be pinned in memory to be useful. It's fine for a single smart application
> per server environment, but in real world with many dumb thread m
On Wed, 01 Feb 2006 15:42:39 -0800 (PST)
"David S. Miller" <[EMAIL PROTECTED]> wrote:
> From: Andi Kleen <[EMAIL PROTECTED]>
> Date: Wed, 1 Feb 2006 23:55:11 +0100
>
> > On Wednesday 01 February 2006 21:26, Jeff Garzik wrote:
> > > Andi Kleen wrote:
> > > > But I don't think Van's design is suppo
From: Rick Jones <[EMAIL PROTECTED]>
Date: Wed, 01 Feb 2006 15:50:38 -0800
[ What sucks about this whole thread is that only folks like
Jeff and myself are attempting to think and use our imagination
to consider how some roadblocks might be overcome ]
> If the TCP processing is put in the
It almost feels like the channel concept wants a "thread per
connection" model?
No, it means only that your application must be asynchronous -- which
all modern network apps are already.
The INN model of a single process calling epoll(2) for 800 sockets
should continue to work, as should th
From: Mitchell Blank Jr <[EMAIL PROTECTED]>
Date: Wed, 1 Feb 2006 15:37:04 -0800
> So I agree that this would have to be CAP_NET_ADMIN only.
I'm drowning in all of this pessimism folks.
Why not concentrate your thinking on how to make it can be made to
_work_ instead of punching holes in the ide
From: Andi Kleen <[EMAIL PROTECTED]>
Date: Wed, 1 Feb 2006 23:55:11 +0100
> On Wednesday 01 February 2006 21:26, Jeff Garzik wrote:
> > Andi Kleen wrote:
> > > But I don't think Van's design is supposed to be exposed to user space.
> >
> > It is supposed to be exposed to userspace AFAICS.
>
> Th
Jeff Garzik wrote:
> Once packets classified to be delivered to a specific local host socket,
> what further operations are require privs? What received packet data
> cannot be exposed to userspace?
You just need to make sure that you don't leak data from other peoples
sockets. Two issues I se
But people who care about the performance of their networking apps are
likely to want to switch over to this new userspace networking API, over
the next decade, I think.
Yet there needs to be some cross-platform commonality for the API yes? That was
the main thrust behind my simplistic aski
Andi Kleen wrote:
On Wednesday 01 February 2006 21:26, Jeff Garzik wrote:
Andi Kleen wrote:
But I don't think Van's design is supposed to be exposed to user space.
It is supposed to be exposed to userspace AFAICS.
Then it's likely insecure and root only, unless he knows some magic
that w
Rick Jones wrote:
what are the implications for having the application churning away doing
application things while TCP is feeding it data? Or for an application
that is processing more than one TCP connection in a given thread?
It almost feels like the channel concept wants a "thread per con
Rick Jones wrote:
Jeff Garzik wrote:
Key point 1:
Van's slides align closely with the design that I was already working
on, for zero-copy RX.
To have a fully async, zero copy network receive, POSIX read(2) is
inadequate.
Is there an aio_read() in POSIX adequate to the task?
Definitel
On Wednesday 01 February 2006 21:26, Jeff Garzik wrote:
> Andi Kleen wrote:
> > But I don't think Van's design is supposed to be exposed to user space.
>
> It is supposed to be exposed to userspace AFAICS.
Then it's likely insecure and root only, unless he knows some magic
that we don't.
I hope
On Wednesday 01 February 2006 22:11, David S. Miller wrote:
> From: Andi Kleen <[EMAIL PROTECTED]>
> Date: Wed, 1 Feb 2006 19:28:46 +0100
>
> > http://www.lemis.com/grog/Documentation/vj/lca06vj.pdf
>
> I did a writeup in my blog about all of this, another good
> reason to actively follow my blog
At the risk of being told to launch myself towards a body of water...
So, sort of linking with the data about saturating a GbE both ways on a single
TCP connection, and how it required binding netperf to the CPU other than the
one taking interrupts... If channels are taken to their limit, and t
From: Jeff Garzik <[EMAIL PROTECTED]>
Date: Wed, 01 Feb 2006 14:37:46 -0500
> So, I am not concerned with slideware. These are two good ideas that
> are worth pursuing, even if Van produces zero additional output.
Right.
And, to all of you having trouble imagining how else you'd apply these
ne
From: Andi Kleen <[EMAIL PROTECTED]>
Date: Wed, 1 Feb 2006 20:50:31 +0100
> On Wednesday 01 February 2006 20:37, Jeff Garzik wrote:
>
> > To have a fully async, zero copy network receive, POSIX read(2) is
> > inadequate.
>
> Agreed, but POSIX aio is adequate.
No, it's a joke.
To do this stu
From: Andi Kleen <[EMAIL PROTECTED]>
Date: Wed, 1 Feb 2006 19:28:46 +0100
> http://www.lemis.com/grog/Documentation/vj/lca06vj.pdf
I did a writeup in my blog about all of this, another good
reason to actively follow my blog:
http://vger.kernel.org/~davem/cgi-bin/blog.cgi/index.html
Go r
Andi Kleen wrote:
But I don't think Van's design is supposed to be exposed to user space.
It is supposed to be exposed to userspace AFAICS.
It's still in the kernel, just in process context.
Incorrect. Its in the userspace app (though usually via a library).
See slides 26 and 27.
But i
On 2/1/06, Andi Kleen <[EMAIL PROTECTED]> wrote:
> On Wednesday 01 February 2006 20:37, Jeff Garzik wrote:
>
> > To have a fully async, zero copy network receive, POSIX read(2) is
> > inadequate.
>
> Agreed, but POSIX aio is adequate.
>
> > One needs a ring buffer, similar in API to the mmap'd
> >
Andi writes:
> But I don't think Van's design is supposed to be exposed to user space.
> It's just a better way to implement BSD sockets.
Actually, it can, indeed, go all the way to user space - connecting
channels to the socket layer was one of the intermediate steps.
FWIW, I did an article on
Jeff Garzik wrote:
Key point 1:
Van's slides align closely with the design that I was already working
on, for zero-copy RX.
To have a fully async, zero copy network receive, POSIX read(2) is
inadequate.
Is there an aio_read() in POSIX adequate to the task?
One needs a ring buffer, simila
On Wednesday 01 February 2006 20:37, Jeff Garzik wrote:
> To have a fully async, zero copy network receive, POSIX read(2) is
> inadequate.
Agreed, but POSIX aio is adequate.
> One needs a ring buffer, similar in API to the mmap'd
> packet socket, where you can queue a whole bunch of reads.
Key point 1:
Van's slides align closely with the design that I was already working
on, for zero-copy RX.
To have a fully async, zero copy network receive, POSIX read(2) is
inadequate. One needs a ring buffer, similar in API to the mmap'd
packet socket, where you can queue a whole bunch of r
On Wednesday 01 February 2006 14:48, Leonid Grossman wrote:
> David S. Miller wrote:
>
> > And with Van Jacobson net channels, none of this is going to
> > matter and 512 is going to be your limit whether you like it
> > or not. So this short term complexity gain
69 matches
Mail list logo