Antoine Pitrou <pit...@free.fr> added the comment: > My previous experiments along these lines showed it was a dead-end. > The number of probes was the most important factor and beat-out any > effort to improve cache utilization from increased density.
Can you describe your experiments? What workloads or benchmarks did you use? Do note that there are several levels of caches in modern CPUs. L1 is very fast (latency is 3 or 4 cycles) but rather small (32 or 64KB). L2, depending on the CPU, has a latency between 10 and 20+ cycles and can be 256KB to 1MB large. L3, when present, is quite larger but also quite slower (latency sometimes up to 50 cycles). So, even if access patterns are uneven, it is probably rare to have all frequently accessed data in L1 (especially with Python since objects are big). > Another result from earlier experiments is that benchmarking the > experiment is laden with pitfalls. Tight timing loops don't mirror > real world programs, nor do access patterns with uniform random > distributions. I can certainly understand that; can you suggest workloads approaching "real world programs"? ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue10408> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com