INADA Naoki added the comment: > I think that clearing 120 bytes at a time is faster than clear it later > entry-by-entry.
Ah, my word was wrong. This patch skips zero clear entirely. In pseudo code: // When allocating PyDictKeyObject. - memset(dk_entries, 0, sizeof(dk_entries)); // When inserting new item. n = dk_nentries++ e = &dk_entries[dk_nentries++]; e->me_hash = hash; e->me_key = key; if (split_table) { + e->me_value = NULL; ma_values[n] = value; } else { e->me_value = value; } > Your patch removes some asserts, this looks not good. This patch fills dk_entries with 0xcc when Py_DEBUG is enabled. It can find unintentional access to value which comes from reused memory. I'll search more points I can insert effective asserts. > Could your provide microbenchmarks that show the largest speed up and the > largest slow down? So we would see what type of code gets the benefit. Avoiding cache pollution is more important than avoiding 120byte memset in this case. It's difficult to write simple micro benchmark to show effects of cache pollution... $ ./python-patched -m perf timeit --rigorous --compare-to `pwd`/python-default --duplicate 8 -- '{}' python-default: ......................................... 44.6 ns +- 2.4 ns python-patched: ......................................... 44.1 ns +- 1.8 ns Median +- std dev: [python-default] 44.6 ns +- 2.4 ns -> [python-patched] 44.1 ns +- 1.8 ns: 1.01x faster ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue28832> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com