INADA Naoki added the comment: Current code and my patch called insertdict_clean() or insert_index() for each entry. On the other hand, Serhiy's patch calls build_indices() once. This may be faster when compiler doesn't inlining the helper function. As a bonus, we can use memcpy to copy entries.
Cons of Serhiy's patch is it's two pass. If entries are larger than L2 cache, fetch from L3 cache may be larger. So I can't declare that Serhiy's patch is faster until benchmark. (My forecast is no performance difference between my patch and Serhiy's on amd64 machine, and Serhiy's patch is faster on more poor CPU.) ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue28199> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com