INADA Naoki added the comment:

> I think that clearing 120 bytes at a time is faster than clear it later 
> entry-by-entry.

Ah, my word was wrong. This patch skips zero clear entirely.
In pseudo code:

  // When allocating PyDictKeyObject.
- memset(dk_entries, 0, sizeof(dk_entries));

  // When inserting new item.
  n = dk_nentries++
  e = &dk_entries[dk_nentries++];
  e->me_hash = hash;
  e->me_key = key;
  if (split_table) {
+     e->me_value = NULL;
      ma_values[n] = value;
  } else {
      e->me_value = value;
  }


> Your patch removes some asserts, this looks not good.

This patch fills dk_entries with 0xcc when Py_DEBUG is enabled.
It can find unintentional access to value which comes from reused memory.

I'll search more points I can insert effective asserts.


> Could your provide microbenchmarks that show the largest speed up and the 
> largest slow down? So we would see what type of code gets the benefit.

Avoiding cache pollution is more important than avoiding 120byte memset in this 
case.
It's difficult to write simple micro benchmark to show effects of cache 
pollution...

$ ./python-patched -m perf timeit --rigorous --compare-to `pwd`/python-default 
--duplicate 8 -- '{}'
python-default: ......................................... 44.6 ns +- 2.4 ns
python-patched: ......................................... 44.1 ns +- 1.8 ns
Median +- std dev: [python-default] 44.6 ns +- 2.4 ns -> [python-patched] 44.1 
ns +- 1.8 ns: 1.01x faster

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue28832>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to