On Wed, 24 Oct 2007 21:29:56 -0700 "David Schwartz" <[EMAIL PROTECTED]> wrote:
> > > Well that's exactly right. For threaded programs (and maybe even > > real-world non-threaded ones in general), you don't want to be > > even _reading_ global variables if you don't need to. Cache misses > > and cacheline bouncing could easily cause performance to completely > > tank in some cases while only gaining a cycle or two in > > microbenchmarks for doing these funny x86 predication things. > > For some CPUs, replacing an conditional branch with a conditional > move is a *huge* win because it cannot be mispredicted. please name one... Hint: It's not one made by either Intel or AMD in the last 4 years... > In general, > compilers should optimize for unshared data since that's much more > common in typical code. Even for shared data, the usual case is that > you are going to access the data few times, so pulling the cache line > to the CPU is essentially free since it will happen eventually. it's not about pulling it to the CPU, it's pulling it *out* of all the other cpus AS WELL. (and writing it back to memory, taking away memory bandwidth) -- If you want to reach me at my work email, use [EMAIL PROTECTED] For development, discussion and tips for power savings, visit http://www.lesswatts.org - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/