[issue44921] dict subclassing is slow

Marco Sulla Sun, 15 Aug 2021 13:11:10 -0700


New submission from Marco Sulla <[email protected]>:


I asked on SO why subclassing dict makes the subclass much slower in some 
operations. This is the answer by Monica 
(https://stackoverflow.com/a/59914459/1763602):

Indexing and in are slower in dict subclasses because of a bad interaction 
between a dict optimization and the logic subclasses use to inherit C slots. 
This should be fixable, though not from your end.

The CPython implementation has two sets of hooks for operator overloads. There 
are Python-level methods like __contains__ and __getitem__, but there's also a 
separate set of slots for C function pointers in the memory layout of a type 
object. Usually, either the Python method will be a wrapper around the C 
implementation, or the C slot will contain a function that searches for and 
calls the Python method. It's more efficient for the C slot to implement the 
operation directly, as the C slot is what Python actually accesses.

Mappings written in C implement the C slots sq_contains and mp_subscript to 
provide in and indexing. Ordinarily, the Python-level __contains__ and 
__getitem__ methods would be automatically generated as wrappers around the C 
functions, but the dict class has explicit implementations of __contains__ and 
__getitem__, because the explicit implementations 
(https://github.com/python/cpython/blob/v3.8.1/Objects/dictobject.c) are a bit 
faster than the generated wrappers:

static PyMethodDef mapp_methods[] = {
    DICT___CONTAINS___METHODDEF
    {"__getitem__", (PyCFunction)(void(*)(void))dict_subscript,        METH_O | 
METH_COEXIST,
     getitem__doc__},
    ...

(Actually, the explicit __getitem__ implementation is the same function as the 
mp_subscript implementation, just with a different kind of wrapper.)

Ordinarily, a subclass would inherit its parent's implementations of C-level 
hooks like sq_contains and mp_subscript, and the subclass would be just as fast 
as the superclass. However, the logic in update_one_slot 
(https://github.com/python/cpython/blob/v3.8.1/Objects/typeobject.c#L7202) 
looks for the parent implementation by trying to find the generated wrapper 
methods through an MRO search.

dict doesn't have generated wrappers for sq_contains and mp_subscript, because 
it provides explicit __contains__ and __getitem__ implementations.

Instead of inheriting sq_contains and mp_subscript, update_one_slot ends up 
giving the subclass sq_contains and mp_subscript implementations that perform 
an MRO search for __contains__ and __getitem__ and call those. This is much 
less efficient than inheriting the C slots directly.

Fixing this will require changes to the update_one_slot implementation.

Aside from what I described above, dict_subscript also looks up __missing__ for 
dict subclasses, so fixing the slot inheritance issue won't make subclasses 
completely on par with dict itself for lookup speed, but it should get them a 
lot closer.

As for pickling, on the dumps side, the pickle implementation has a dedicated 
fast path 
(https://github.com/python/cpython/blob/v3.8.1/Modules/_pickle.c#L4291) for 
dicts, while the dict subclass takes a more roundabout path through 
object.__reduce_ex__ and save_reduce.

On the loads side, the time difference is mostly just from the extra opcodes 
and lookups to retrieve and instantiate the __main__.A class, while dicts have 
a dedicated pickle opcode for making a new dict. If we compare the disassembly 
for the pickles:

In [26]: pickletools.dis(pickle.dumps({0: 0, 1: 1, 2: 2, 3: 3, 4: 4}))          
                                                                                
                                                                 
    0: \x80 PROTO      4
    2: \x95 FRAME      25
   11: }    EMPTY_DICT
   12: \x94 MEMOIZE    (as 0)
   13: (    MARK
   14: K        BININT1    0
   16: K        BININT1    0
   18: K        BININT1    1
   20: K        BININT1    1
   22: K        BININT1    2
   24: K        BININT1    2
   26: K        BININT1    3
   28: K        BININT1    3
   30: K        BININT1    4
   32: K        BININT1    4
   34: u        SETITEMS   (MARK at 13)
   35: .    STOP
highest protocol among opcodes = 4

In [27]: pickletools.dis(pickle.dumps(A({0: 0, 1: 1, 2: 2, 3: 3, 4: 4})))       
                                                                                
                                                                 
    0: \x80 PROTO      4
    2: \x95 FRAME      43
   11: \x8c SHORT_BINUNICODE '__main__'
   21: \x94 MEMOIZE    (as 0)
   22: \x8c SHORT_BINUNICODE 'A'
   25: \x94 MEMOIZE    (as 1)
   26: \x93 STACK_GLOBAL
   27: \x94 MEMOIZE    (as 2)
   28: )    EMPTY_TUPLE
   29: \x81 NEWOBJ
   30: \x94 MEMOIZE    (as 3)
   31: (    MARK
   32: K        BININT1    0
   34: K        BININT1    0
   36: K        BININT1    1
   38: K        BININT1    1
   40: K        BININT1    2
   42: K        BININT1    2
   44: K        BININT1    3
   46: K        BININT1    3
   48: K        BININT1    4
   50: K        BININT1    4
   52: u        SETITEMS   (MARK at 31)
   53: .    STOP
highest protocol among opcodes = 4

we see that the difference between the two is that the second pickle needs a 
whole bunch of opcodes to look up __main__.A and instantiate it, while the 
first pickle just does EMPTY_DICT to get an empty dict. After that, both 
pickles push the same keys and values onto the pickle operand stack and run 
SETITEMS

----------
components: C API
messages: 399625
nosy: Marco Sulla
priority: normal
severity: normal
status: open
title: dict subclassing is slow
type: performance
versions: Python 3.9

_______________________________________
Python tracker <[email protected]>
<https://bugs.python.org/issue44921>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue44921] dict subclassing is slow

Reply via email to