On Sat, Dec 03, 2005 at 09:20:52AM -0500, jamal wrote: > > That's not quite correct IMHO. The prefetching can get cachelines > > in-flight which will reduce the CPU stall (in the case the cacheline > > hasn't arrived before CPU asked for it). ... > You seem to say that if s/ware schedules a prefetch, when the CPU > needs to load that location into cache it will "know" that a prefetch > has already been issued?
For RISC, I'm pretty sure that's the case. But I'm no HW designer. If a fetch request is already outstanding for a particular cacheline, a CPU that could issue multiple fetchs (most RISC) would waste alot of memory bus bandwidth re-issueing fetches for the same line. I would hope cache controllers implement this simple optimization. > - prefetching has dependencies on workload, memory latencies, cache > sizes and CPU architecture. On the size of the cache in regards to when > you scheduled the prefetch: > a) if you call prefetch too early, cache size (and workload/code) > dependent, it may be evicted before you get to it. Prefetching may also > displace data that is in use. > b) if you fetch it too late, it wont be there when you get to needing to > use it. And that the CPU will fetch again. You are sort of repeating the same mistake in (b) that I tried to explain before (below). The prefetch benefits if the cacheline is in flight and reduces the CPU stall by more than it costs to issue the prefetch. (Yes, there may be other side effects the prefetch has to additionally compensate for like ejectation of cachelines we are about to re-use and additional BW "wasted" if the prefetch isn't used. But I want to keep the basic idea obvious: first order guess is relatively simple math and can be measured with "get_cycles()".) > > "the still gonna stall" case has to be evaluated for how long we stall > > and if the prefetching helped (or not). ie stalling on AMD64 local memory > > is not as bad as stalling on remote NUMA memory. And it depends on how far > > in advance we can prefetch. > > > > Ok, so you seem to be saying again that for case #b above, there is no > harm in issuing the prefetch late since the CPU wont issue a second > fetch for the address? Right. (When that's not true, add to "cost of issueing a prefetch".) > I was hoping to just be able to turn off the prefetch from the driver > when i know my hardware does well using it. I agree with Dave Miller - we should disable the prefetch at compile time for the CPU/platform combinations we know don't benefit from it. It has to be on a case-by-case basis. > I suspect most newer hardware wont have observable issues given that > memory latencies have improved over the last 2-3 years; but we run on a > lot of older hardware too. In general, I thought the trend was that Memory latency has *increased* when measured in clock cycles (vs wall clock time). And given that caches are also bigger now, prefetch should benefit in more cases than say, 5 years ago. thanks, grant - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html