New submission from tehybel: Here I'll describe five distinct issues I found. Common to them all is that they reside in the built-in dictionary object.
Four of them are use-after-frees and one is an array-out-of-bounds indexing bug. All of the described functions reside in /Objects/dictobject.c. Issue 1: use-after-free when initializing a dictionary Initialization of a dictionary happens via the function dict_init which calls dict_update_common. From there, PyDict_MergeFromSeq2 may be called, and that is where this issue resides. In PyDict_MergeFromSeq2 we retrieve a sequence of size 2 with this line: fast = PySequence_Fast(item, ""); After checking its size, we take out a key and value: key = PySequence_Fast_GET_ITEM(fast, 0); value = PySequence_Fast_GET_ITEM(fast, 1); Then we call PyDict_GetItem. This calls back to Python code if the key has a __hash__ function. From there the "item" sequence could get modified, resulting in "key" or "value" getting used after having been freed. Here's a PoC: --- class X: def __hash__(self): pair[:] = [] return 13 pair = [X(), 123] dict([pair]) --- It crashes while trying to use freed memory as a PyObject: (gdb) run ./poc24.py Program received signal SIGSEGV, Segmentation fault. 0x000000000048fe25 in insertdict (mp=mp@entry=0x7ffff6d5c4b8, key=key@entry=0x7ffff6d52538, hash=0xd, value=value@entry=0x8d1ac0 <small_ints+6144>) at Objects/dictobject.c:831 831 MAINTAIN_TRACKING(mp, key, value); (gdb) print *key $26 = {_ob_next = 0xdbdbdbdbdbdbdbdb, _ob_prev = 0xdbdbdbdbdbdbdbdb, ob_refcnt = 0xdbdbdbdbdbdbdbdb, ob_type = 0xdbdbdbdbdbdbdbdb} Issue 2: use-after-free in dictitems_contains In the function dictitems_contains we call PyDict_GetItem to look up a value in the dictionary: found = PyDict_GetItem((PyObject *)dv->dv_dict, key); However this "found" variable is borrowed. We then go ahead and compare it: return PyObject_RichCompareBool(value, found, Py_EQ); But PyObject_RichCompareBool could call back into Python code and e.g. release the GIL. As a result, the dictionary may be mutated. Thus "found" could get freed. Then, inside PyObject_RichCompareBool (actually in do_richcompare), the "found" variable gets used after being freed. PoC: --- class X: def __eq__(self, other): d.clear() return NotImplemented d = {0: set()} (0, X()) in d.items() --- Result: (gdb) run ./poc25.py Program received signal SIGSEGV, Segmentation fault. 0x00000000004a03b6 in do_richcompare (v=v@entry=0x7ffff6d52468, w=w@entry=0x7ffff6ddf7c8, op=op@entry=0x2) at Objects/object.c:673 673 if (!checked_reverse_op && (f = w->ob_type->tp_richcompare) != NULL) { (gdb) print w->ob_type $26 = (struct _typeobject *) 0xdbdbdbdbdbdbdbdb Issue 3: use-after-free in dict_equal In the function dict_equal, we call the "lookdict" function via b->ma_keys->dk_lookup to look up a value: if ((b->ma_keys->dk_lookup)(b, key, ep->me_hash, &vaddr) == NULL) This value's address is stored into the "vaddr" variable and the value is fetched into the "bval" variable: bval = *vaddr; Then we call Py_DECREF(key) which can call back into Python code. This could release the GIL and mutate dictionary b. Therefore "bval" could become freed at this point. We then proceed to use "bval": cmp = PyObject_RichCompareBool(aval, bval, Py_EQ); This results in a use-after-free. PoC: --- class X(): def __del__(self): dict_b.clear() def __eq__(self, other): dict_a.clear() return True def __hash__(self): return 13 dict_a = {X(): 0} dict_b = {X(): X()} dict_a == dict_b --- Result: (gdb) run ./poc26.py Program received signal SIGSEGV, Segmentation fault. PyType_IsSubtype (a=0xdbdbdbdbdbdbdbdb, b=0x87ec60 <PyLong_Type>) at Objects/typeobject.c:1343 1343 mro = a->tp_mro; (gdb) print a $59 = (PyTypeObject *) 0xdbdbdbdbdbdbdbdb Issue 4: use-after-free in _PyDict_FromKeys The function _PyDict_FromKeys takes an iterable as argument. If the iterable is a dict, _PyDict_FromKeys loops over it like this: while (_PyDict_Next(iterable, &pos, &key, &oldvalue, &hash)) { if (insertdict(mp, key, hash, value)) { ... } } However if we look at the comment for PyDict_Next, we see this: * CAUTION: In general, it isn't safe to use PyDict_Next in a loop that * mutates the dict. But insertdict can call on to Python code which might mutate the dict. In that case we perform a use-after-free of the "key" variable. Here's a PoC: --- class X(int): def __hash__(self): return 13 def __eq__(self, other): if len(d) > 1: d.clear() return False d = {} d = {X(1): 1, X(2): 2} x = {}.fromkeys(d) --- And the result: (gdb) run ./poc27.py Program received signal SIGSEGV, Segmentation fault. 0x0000000000435122 in visit_decref (op=0x7ffff6d5ca68, data=0x0) at Modules/gcmodule.c:373 373 if (PyObject_IS_GC(op)) { (gdb) print *op $115 = {_ob_next = 0xdbdbdbdbdbdbdbdb, _ob_prev = 0xdbdbdbdbdbdbdbdb, ob_refcnt = 0xdbdbdbdbdbdbdbdb, ob_type = 0xdbdbdbdbdbdbdbdb} An almost identical issue also exists further down in the function when calling _PySet_NextEntry. To see this crash, just change "d" to be a set in the PoC above: d = set() d = set([X(1), X(2)]) this likewise crashes with a use-after-free. (Note: if you grep for PyDict_Next you will find more similar cases, although many are in obscure modules or deprecated functions. I'm not sure those are worth fixing? E.g. here's a crasher for BaseException_setstate which also calls PyDict_Next: --- class X(str): def __hash__(self): d.clear() return 13 d = {} d[X()] = X() e = Exception() e.__setstate__(d) --- end note.) Issue 5: out-of-bounds indexing in dictiter_iternextitem The function dictiter_iternextitem is used to iterate over a dictionary's items. dictiter_iternextitem is careful to check that the dictionary did not change size during iteration. However after performing this check, it calls Py_DECREF: Py_DECREF(PyTuple_GET_ITEM(result, 0)); Py_DECREF(PyTuple_GET_ITEM(result, 1)); This can execute Python code and mutate the dict. If that happens, the index "i" previously computed by dictiter_iternextitem could become invalid. It would then index out of bounds with this line: key = d->ma_keys->dk_entries[i].me_key; Furthermore the "value_ptr" variable would have gone stale, too. Taking the "value" variable out of it uses memory that has been freed: value = *value_ptr; Here's a PoC which crashes with the "value" variable being an arbitrary pointer: --- class X(int): def __del__(self): d.clear() d = {i: X(i) for i in range(8)} for result in d.items(): if result[0] == 2: d[2] = None # free d[2] --> X(2).__del__ is called --- The result: (gdb) run ./poc29.py Program received signal SIGSEGV, Segmentation fault. dictiter_iternextitem (di=0x7ffff6d49cd8) at Objects/dictobject.c:3187 3187 Py_INCREF(key); (gdb) print value $12 = (PyObject *) 0x7b7b7b7b7b7b7b7b ---------- components: Interpreter Core messages: 274275 nosy: tehybel priority: normal severity: normal status: open title: five dictobject issues _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue27945> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com