New submission from STINNER Victor: Serialization of Unicode strings in the pickle module is suboptimal, especially for long strings.
Attached patch optimize the serialization thanks to new properties of Unicode strings (PEP 393): * text (protocol 0): avoid any temporary buffer if the string is an ASCII or latin1 string without "\\" or "\n" character; otherwise use a small buffer of 64 KB (instead of two buffer) * binary (protocol 1, 2): avoid any temporary buffer if string is an ASCII string or if the string is already available encoded as UTF-8 The current code for protocol 0 uses raw_unicode_escape() which is really suboptimal: it uses a first buffer to write the escape string, and then a new temporary buffer to store the buffer with the right size (instead of just calling _PyBytes_Resize). ---------- components: Library (Lib) files: pickle_unicode.patch keywords: patch messages: 167730 nosy: alexandre.vassalotti, haypo, pitrou priority: normal severity: normal status: open title: pickle: Faster serialization of Unicode strings type: performance versions: Python 3.4 Added file: http://bugs.python.org/file26730/pickle_unicode.patch _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue15596> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com