STINNER Victor <victor.stin...@haypocalc.com> added the comment: Le 09/12/2011 22:12, Stefan Krah a écrit : > The bottleneck in _decimal is (res is ascii): > > PyUnicode_FromString(res); > > PyUnicode_DecodeASCII(res) has the same performance. > > > With this function ... > > static PyObject* > unicode_fromascii(const char* s, Py_ssize_t size) > { > PyObject *res; > res = PyUnicode_New(size, 127); > if (!res) > return NULL; > memcpy(PyUnicode_1BYTE_DATA(res), s, size); > return res; > } > > ... I get the same performance as with Python 2.7 (5.85s)!
The problem is that unicode_fromascii() is unsafe: it doesn't check that the string is pure ASCII. That's why this function is private. Because of the PEP 383, ASCII and UTF-8 decoders (PyUnicode_DecodeASCII and PyUnicode_FromString) have to first scan the input to check for errors, and then do a fast memcpy. The scanner of these two decoders is already optimized to process the input string word by word (word=the C long type), instead of byte by byte, using a bit mask. -- You can write your own super fast ASCII decoder using two lines: res = PyUnicode_New(size, 127); memcpy(PyUnicode_1BYTE_DATA(res), s, size); (this is exactly what unicode_fromascii does) > I think it would be really beneficial for C-API users to have > more ascii low level functions that don't do error checking and > are simply as fast as possible. It is really important to ensure that a ASCII string doesn't contain characters outside [U+0000; U+007F] because many operations on ASCII string are optimized (e.g. UTF-8 pointer is shared with the ASCII pointer). I prefer to not expose such function or someone will use it without understanding exactly how dangerous it is. Martin and other may disagree with me. Do you know Murphy's Law? :-) http://en.wikipedia.org/wiki/Murphy%27s_law ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue13570> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com