New submission from tehybel: I'll describe 7 issues in the /Modules/_elementtree.c module here. They include multiple use-after-frees, type confusions and instances of out-of-bounds array indexing.
Issue 1: use-after-free in element_get_text The problematic code looks like this: LOCAL(PyObject*) element_get_text(ElementObject* self) { /* return borrowed reference to text attribute */ PyObject* res = self->text; if (JOIN_GET(res)) { res = JOIN_OBJ(res); if (PyList_CheckExact(res)) { res = list_join(res); if (!res) return NULL; self->text = res; } } return res; } As we can see, if res is a list, we call list_join with res, which is self->text. list_join will decrease self->text's reference count. When that happens, arbitrary python code can run. If that code uses self->text, a use-after-free occurs. PoC (Proof-of-Concept segfaulting script): --- import _elementtree as et class X(str): def __del__(self): print(elem.text) b = et.TreeBuilder() b.start("test") b.data(["AAAA", X("BBBB")]) b.start("test2") elem = b.close() print(elem.text) --- Issue 2: use-after-free in element_get_tail The same type of issue also exists in element_get_tail and should be fixed there, too. Issue 3: type confusion in elementiter_next The function elementiter_next iterates over a tree consisting of elements. Each element has an array of children. Before doing any casts, most of the elementtree code is careful to check that these children are, indeed, elements; that is, that their type is correct. The problem is that elementiter_next does not validate these child objects before performing a cast. Here is the relevant line: elem = (ElementObject *)cur_parent->extra->children[child_index]; If the child is not an element, a type confusion occurs. Here's a PoC: ----- import _elementtree as et state = { "tag": "tag", "_children": [1,2,3], "attrib": "attr", "tail": "tail", "text": "text", } e = et.Element("ttt") e.__setstate__(state) for x in e.iter(): print(x) ----- Issue 4: array-out-of-bounds indexing in element_subscr This issue occurs when taking the subscript of an element. This is done using the element_subscr function. The function calls PySlice_GetIndicesEx: if (PySlice_GetIndicesEx(item, self->extra->length, &start, &stop, &step, &slicelen) < 0) { return NULL; } The problem is that to evaluate the indices, PySlice_GetIndicesEx might call back into python code. That code might cause the element's self->extra->length to change. If this happens, the variables "start", "stop" and "step" might no longer be appropriate. The code then uses these variables for array indexing: for (cur = start, i = 0; i < slicelen; cur += step, i++) { PyObject* item = self->extra->children[cur]; ... } But this could go out of bounds and interpret arbitrary memory as a PyObject. Here's a PoC that reproduces this: --- import _elementtree as et class X: def __index__(self): e[:] = [] return 1 e = et.Element("elem") for _ in range(10): e.insert(0, et.Element("child")) print(e[0:10:X()]) --- Running it results in a segfault: (gdb) r ./poc14.py Starting program: /home/xx/Python-3.5.2/python ./poc14.py Program received signal SIGSEGV, Segmentation fault. 0x000000000049f933 in PyObject_Repr (v=0x7ffff68af058) at Objects/object.c:471 471 if (Py_TYPE(v)->tp_repr == NULL) (gdb) print *v $37 = {_ob_next = 0xdbdbdbdbdbdbdbdb, _ob_prev = 0xdbdbdbdbdbdbdbdb, ob_refcnt = 0xdbdbdbdbdbdbdbdc, ob_type = 0xdbdbdbdbdbdbdbdb} As we can see, "v" is freed memory with arbitrary contents. Issue 5: array-out-of-bounds indexing in element_ass_subscr A separate issue of the same type, also due to a call to PySlice_GetIndicesEx, exists in element_ass_subscr. Here's a proof-of-concept script for that: --- import _elementtree as et class X: def __index__(self): e[:] = [] return 1 e = et.Element("elem") for _ in range(10): e.insert(0, et.Element("child")) e[0:10:X()] = [] --- To fix these, I believe we could check whether self->extra->length changed after calling PySlice_GetIndicesEx, and bail out if so. (You can grep the codebase for "changed size during iteration" for examples of some similarish cases.) Issue 6: use-after-free in treebuilder_handle_start In the function treebuilder_handle_start we see these lines: if (treebuilder_set_element_text(self->last, self->data)) return NULL; Here self->last is the most recent element, and we are setting its text to self->data. This assignment happens via the function treebuilder_set_element_text which in turn calls treebuilder_set_element_text_or_tail. There, if the element self->last is not an exact instance of Element_Type, we run these lines: PyObject *joined = list_join(data); int r; if (joined == NULL) return -1; r = _PyObject_SetAttrId(element, name, joined); Py_DECREF(joined); return r; The comment for list_join says this: /* join list elements (destroying the list in the process) */ So note that we destroy the given list. If we go back to the initial function, treebuilder_handle_start, then we recall that "element" is really self->last and "data" is actually self->data. When list_join(data) is called, self->data will therefore be destroyed. So we have to be careful and ensure that self->data is overwritten; otherwise it could get used after being freed. Unfortunately the call to _PyObject_SetAttrId could fail -- for example if "element" is an integer. Then treebuilder_handle_start just bails out without clearing self->data. This results in a use-after-free. Here's a PoC: --- import _elementtree as et b = et.TreeBuilder(element_factory=lambda x, y: 123) b.start("tag", {}) b.data("ABCD") b.start("tag2", {}) --- It sets self->last to be an integer, 123. When we try to set the attribute "text" on this integer, it fails. At that point, self->data has been freed but the reference isn't cleared. When treebuilder_dealloc is called by the garbage collector, self->data is excessively decref'd and we get a segfault: (gdb) run ./poc16.py Traceback (most recent call last): File "./poc16.py", line 7, in <module> b.start("tag2", {}) AttributeError: 'int' object has no attribute 'text' Program received signal SIGSEGV, Segmentation fault. 0x0000000000435122 in visit_decref (op=0x7ffff6d52380, data=0x0) at Modules/gcmodule.c:373 373 if (PyObject_IS_GC(op)) { (gdb) print *op $52 = {_ob_next = 0xdbdbdbdbdbdbdbdb, _ob_prev = 0xdbdbdbdbdbdbdbdb, ob_refcnt = 0xdbdbdbdbdbdbdbdb, ob_type = 0xdbdbdbdbdbdbdbdb} Note that this issue exists in two places inside treebuilder_handle_start -- the call to treebuilder_set_element_tail has the same problem and both should be fixed. Issue 7: use-after-free in treebuilder_handle_end The same sort of issue also exists in treebuilder_handle_end and should be fixed there as well. All these issues were found in Python-3.5.2 and tested with --with-pydebug enabled. I did not check the 2.7 branch, but some (though not all) of the issues likely exist there as well. ---------- components: Extension Modules messages: 273662 nosy: tehybel priority: normal severity: normal status: open title: multiple issues in _elementtree module versions: Python 2.7, Python 3.5, Python 3.6 _______________________________________ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue27863> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com