New submission from Raymond Hettinger: The dictresize() method unnecessarily copies the keys, values, and hashes as well as making insertions in the index table. Only the latter step is necessary or desirable.
Here in the pure Python code for resizing taking from the original proof-of-concept code at https://code.activestate.com/recipes/578375 def _resize(self, n): '''Reindex the existing hash/key/value entries. Entries do not get moved, they only get new indices. No calls are made to hash() or __eq__(). ''' n = 2 ** n.bit_length() # round-up to power-of-two self.indices = self._make_index(n) for index, hashvalue in enumerate(self.hashlist): for i in Dict._gen_probes(hashvalue, n-1): if self.indices[i] == FREE: break self.indices[i] = index self.filled = self.used And here is a rough sketch of what it would look like in the C code (very rough, not yet compileable): static void insert_indices_clean(PyDictObject *mp, Py_hash_t hash) { size_t i, perturb; PyDictKeysObject *k = mp->ma_keys; size_t mask = (size_t)DK_SIZE(k)-1; i = hash & mask; for (perturb = hash; dk_get_index(k, i) != DKIX_EMPTY; perturb >>= PERTURB_SHIFT) { i = mask & ((i << 2) + i + perturb + 1); } dk_set_index(k, i, k->dk_nentries); } static int dictresize(PyDictObject *mp, Py_ssize_t minused) { Py_ssize_t i, newsize; PyDictKeyEntry *ep0; /* Find the smallest table size > minused. */ for (newsize = PyDict_MINSIZE; newsize <= minused && newsize > 0; newsize <<= 1) ; if (newsize <= 0) { PyErr_NoMemory(); return -1; } /* Resize and zero-out the indices array */ realloc(dk->dk_indices, es * newsize); memset(&dk->dk_indices.as_1[0], 0xff, es * size); dk->dk_size = size; /* Loop over hashes, skipping NULLs, inserting new indices */ for (i = 0; i < mp->dk_nentries; i++) { PyDictKeyEntry *ep = &ep0[i]; if (ep->me_value != NULL) { insert_indices_clean(mp, ep->me_hash); } } return 0; } ---------- components: Interpreter Core messages: 276921 nosy: methane, rhettinger, serhiy.storchaka priority: normal severity: normal status: open title: Compact dict resizing is doing too much work versions: Python 3.6 _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue28199> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com