I've been tracking down a crash one of our users gets occassionally. He has a quad Intel(R) XEON(TM) CPU 2.00GHz (1996.61-MHz 686-class CPU) system.
After getting a few of these crashes he pulled three of the four cpus out. But with just one physical cpu, with HTT turned on (so two logical cpus), he is still getting these crashes. This is the sequence that causes the bad data: cpu #0 write A write B (HT)cpu #1 read B if (B) read A <---- gets OLD data in A, not new data Now I was depending on the presumed write ordering, so if a foreign cpu sees that B is updated it can assume that A has also been updated. But I'm beginning to think that it isn't working as advertised. I've read the manuals over and over again and they seem to only guarentee write ordering between physical cpus, not between logical HT cpus, and even then it appears that a cpu can do a speculative read and thus get an old value for A even after getting a new value for B. I looked at the various SFENCE/LFENCE/MFENCE instructions and they do not seem to guarentee ordering for speculative accesses at all. They all say that they do not protect against speculative reads. Bus-locked instructions don't seem to avoid speculative reads either. I'm even more confused because this bug is occuring between two logical cpus on the same physical die. Is write ordering not guarenteed with respect to the other logical cpu? Can one logical cpu prefetch data early then then becomes obsolete by the time the instruction is actually run? Or perhaps its a pipeline bug... I just don't know. But it's damn annoying. The only solution I see is to use an actual serializing instruction like cpuid. I really do not want to have to use cpuid :-(. So, has anyone seen anything similar? -Matt _______________________________________________ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "[EMAIL PROTECTED]"