On Wed, Oct 10, 2007 at 02:25:50AM -0700, David Miller wrote:
> The chip I was working with at the time (UltraSPARC-IIi) compressed
> all the linear stores into 64-byte full cacheline transactions via
> the store buffer.

That's a pretty old CPU. Conclusions on more modern ones might be different.

> In fact, such a thing might not pan out well, because most of the time
> you write a single descriptor or two, and that isn't a full cacheline,
> which means a read/modify/write is the only coherent way to make such
> a write to RAM.

x86 WC does R-M-W and is coherent of course. The main difference is 
just that the result is not cached.  When the hardware accesses the cache line
then the cache should be also invalidated.

> Sure you could batch, but I'd rather give the chip work to do unless
> I unequivocably knew I'd have enough pending to fill a cacheline's
> worth of descriptors.  And since you suggest we shouldn't queue in
> software... :-)

Hmm, it probably would need to be coupled with batched submission if 
multiple packets are available you're right. Probably not worth doing explicit
queueing though.

I suppose it would be an interesting experiment at least.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to