[issue46845] dict: Use smaller entry for Unicode-key only dict.

Inada Naoki Sat, 26 Feb 2022 03:15:36 -0800


Inada Naoki <songofaca...@gmail.com> added the comment:


In most case, first PyDict_SetItem decides which format should be used.

But _PyDict_NewPresized() can be a problem. It creates a hash table before 
inserting the first key, when 5 < (expected size) < 87382.

In CPython code base, _PyDict_NewPresized() is called from three places:

1. call.c: Building kwargs dict -- all key should be Unicode.
2. ceval.c: BUILD_MAP and BUILD_CONST_KEY_MAP -- there is no guarantee that all 
keys are Unicode.


Current pull request assumes the dict keys are unicode-only key. So building 
dict from non-Unicode keys become slower.

```
$ ./python -m pyperf timeit --compare-to ../cpython/python -- '{(1,2):3, 
(4,5):6, (7,8):9, (10,11):12, (13,14):15, (16,17):18}'
/home/inada-n/work/python/cpython/python: ..................... 233 ns +- 1 ns
/home/inada-n/work/python/dict-compact/python: ..................... 328 ns +- 
6 ns

Mean +- std dev: [/home/inada-n/work/python/cpython/python] 233 ns +- 1 ns -> 
[/home/inada-n/work/python/dict-compact/python] 328 ns +- 6 ns: 1.41x slower
```

There are some approaches to fix this problem:

1. Don't use _PyDict_NewPresized() in BUILD_MAP, BUILD_CONST_KEY_MAP

```
$ ./python -m pyperf timeit --compare-to ../cpython/python -- '{(1,2):3, 
(4,5):6, (7,8):9, (10,11):12, (13,14):15, (16,17):18}'
/home/inada-n/work/python/cpython/python: ..................... 233 ns +- 1 ns
/home/inada-n/work/python/dict-compact/python: ..................... 276 ns +- 
1 ns

Mean +- std dev: [/home/inada-n/work/python/cpython/python] 233 ns +- 1 ns -> 
[/home/inada-n/work/python/dict-compact/python] 276 ns +- 1 ns: 1.18x slower
```

I think this performance regression is acceptable level.

2. Add an argument `unicode` to _PyDict_NewPresized(). -- Breaks some 3rd party 
codes using internal APIs.
3. Add a new internal C API such that _PyDict_NewPresizedUnicodeKey(). -- Most 
conservative.
4. Add a new internal C API that creates dict form keys and values for extreme 
performance, like this:

// Create a new dict from keys and values.
// Items are received as `{keys[i*keys_offset]: values[i*values_offset] for i 
in range(length)}`.
// When distinct=1, this function skips checking duplicated keys.
// So pass distinct=1 unless you can guarantee that there is no duplicated keys.
PyObject *
PyDict_FromKeysAndValues(PyObject **keys, Py_ssize_t keys_offset, PyObject 
**values, Py_ssize_t values_offset, Py_ssize_t lenghh, int distincit)
{
}

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue46845>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue46845] dict: Use smaller entry for Unicode-key only dict.

Reply via email to