New submission from Alexey Izbyshev <izbys...@ispras.ru>:
A failure of PyUnicode_AsUTF8AndSize() in various fromisoformat() functions in Modules/_datetimemodule.c leads to NULL dereference due to the missing check, e.g.: >>> from datetime import date >>> date.fromisoformat('\ud800') Segmentation fault (core dumped) This is similar to msg123474. The missing NULL check was reported by Svace static analyzer. While preparing tests for this issue, I've discovered a deeper problem. The C datetime implementation uses PyUnicode_AsUTF8AndSize() in several places, making some functions reject strings containing surrogate code points (0xD800 - 0xDFFF) since they can't be encoded in UTF-8. On the other hand, the pure-Python datetime implementation doesn't have this restriction. For example: >>> import sys >>> sys.modules['_datetime'] = None # block C implementation >>> from datetime import time >>> time().strftime('\ud800') '\ud800' >>> del sys.modules['datetime'] >>> del sys.modules['_datetime'] >>> from datetime import time >>> time().strftime('\ud800') Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeEncodeError: 'utf-8' codec can't encode character '\ud800' in position 0: surrogates not allowed My PR (coming soon) doesn't address this difference but focuses on fixing the immediate problem instead. Suggestions are appreciated. ---------- components: Extension Modules messages: 323844 nosy: belopolsky, izbyshev, p-ganssle, serhiy.storchaka priority: normal severity: normal status: open title: datetime: NULL dereference in fromisoformat() on PyUnicode_AsUTF8AndSize() failure type: crash versions: Python 3.7, Python 3.8 _______________________________________ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue34454> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com