[issue25318] Add _PyBytesWriter API to optimize Unicode encoders

STINNER Victor Mon, 05 Oct 2015 05:13:24 -0700

STINNER Victor added the comment:

A few months ago, I wrote a previous implementation of the _PyBytesWriter API 
which embedded the "current pointer" inside _PyBytesWriter API. The problem was 
that GCC produced less efficient code than expect for the hotspot of the 
encoder.


In the new implementation (attached patch), the "current pointer" is unchanged: 
it's still a variable local to the encoder function. Instead, the current 
pointer became a *parameter* to all _PyBytesWriter *functions*.

I expect to not touch performances of encoders for valid encoded strings (when 
the code calling error handlers is not used), which is important since we have 
very good performance here.

_PyBytesWriter is not restricted to the code to allocate the buffer.

--

bytes_writer.patch:

+    char stackbuf[256];

Oh, I forgot to mention this other small optimization. I also added a small 
buffer allocated on the C stack for the UCS1 encoder (ASCII, Latin1). It may 
optimize a little bit encoding when the output string is smaller than 256 bytes 
when the error handler is used.

The optimization comes from the very efficient UTF-8 encoder.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue25318>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue25318] Add _PyBytesWriter API to optimize Unicode encoders

Reply via email to