Antoine Pitrou <pit...@free.fr> added the comment:

> My previous experiments along these lines showed it was a dead-end.
> The number of probes was the most important factor and beat-out any
> effort to improve cache utilization from increased density.  

Can you describe your experiments? What workloads or benchmarks did you
use?

Do note that there are several levels of caches in modern CPUs. L1 is
very fast (latency is 3 or 4 cycles) but rather small (32 or 64KB). L2,
depending on the CPU, has a latency between 10 and 20+ cycles and can be
256KB to 1MB large. L3, when present, is quite larger but also quite
slower (latency sometimes up to 50 cycles).
So, even if access patterns are uneven, it is probably rare to have all
frequently accessed data in L1 (especially with Python since objects are
big).

> Another result from earlier experiments is that benchmarking the
> experiment is laden with pitfalls.  Tight timing loops don't mirror
> real world programs, nor do access patterns with uniform random
> distributions.

I can certainly understand that; can you suggest workloads approaching
"real world programs"?

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue10408>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to