On Fri, 2005-06-03 at 18:47, Matthew Dillon wrote: > :This is normal behaviour. > :Take a look at IA-32 Intel Developers ... Vol 3, > :Section: 7.2.2 for details + solutions. > : > :Stephan > > Ok.. that section seems to indicate that speculative reads > can pass writes, but it also says that the pipeline sniffs the address > within the processor and ensures proper ordering. The latter part > makes sense within the context of a single cpu, but the big question is: > Is that supposed to hold true for interactions with HT cpus (that share > the pipeline) as well? Or not ? It seems not.
Memory ordering in logical HT CPUs is the same as in real CPUs (see 7.6.1.9) > > Speculative reads creating out of order situations seems to be the > biggest issue. The AMD manual (Programmers manual volume 3 page > 186, MFENCE instruction) says this: > > "The MFENCE instruction is weakly-ordered with respect to data and > instruction prefetches. Speculative loads initiated by the processor, > or specified explicitly using cache-prefetch instructions, can be > reordered around an MFENCE". Speculative loads can pass MFENCE - but can not pass load operations issued before MFENCE. > This seems to be different then what the Intel manual says, and doesn't > make much sense. What's the point of having a fence instruction if it > can't guarentee read/write ordering? Is the AMD manual simply wrong ? Not wrong - just confusing. READ A MFENCE READ B can cause READ A Speculative READ B MFENCE but NOT Speculative READ B READ A MFENCE > Other then that, the Intel manual does indicate that speculative reads > will not pass locked bus cycle instructions (the AMD manual says nothing > about that that I can see). AMD Volume 1 - 3.9.2 > So, presumably, doing a dummy locked bus > cycle operation on e.g. the top of the stack, such as Linux does, would > be sufficient to ensure read ordering. Would you concur with that > assessment? Yes > What's really horrible here is that the 'old' value of the data being > used is modified at location A something like 30 instructions prior to > the instruction that updates the index (B). I think this is a > situation that can only occur in an HT configuration, and then only if > the speculative read issued by the HT cpu is being held for across > 30 instructions executed by the primary cpu before the HT cpu issues the > read of B. > > cpu #0 cpu #1 (HT cpu on same die as cpu #0) > > speculatively read A > write A (stalled) > [30 instructions] (stalled x 30) > write B (stalled) > read B > see that B has been updated > read A (get old value for A instead of new) > > Is that even possible ? Not only the 30 instruction latency, but also > the fact that even with the shared pipeline you have a speculative read > on the HT cpu surviving 30 instructions running on cpu #0 (but only one > or two on the HT cpu)... even though they share the same pipeline. Take a look at store buffers. Reads have a higher priority than writes on some CPUs and data may be even stored indefinitely long in a store buffer. ( Where it can not be observed by other CPUs) Reading some of the Intel and AMD errata gives you a good picture. Stephan _______________________________________________ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "[EMAIL PROTECTED]"