New submission from STINNER Victor:

Attached patch optimizes the UTF-8 encoder for error handlers: ignore, replace, 
surrogateescape, surrogatepass. It is based on the patch  
faster_surrogates_hadling.patch written by Serhiy Storchaka in the issue #24870.

It also modifies unicode_encode_ucs1() to use memset() for the replace error 
handler. It should be faster for long sequences of unencodable characters, but 
it may be slower for short sequences of unencodable characters.

The patch adds new unit tests and fix unit tests to ensure that utf-8-sig codec 
is also well tested.

TODO: write a benchmark.

See also the issue #25227 which optimized ASCII and latin1 encoders with the 
surrogateescape error handlers.

----------
components: Unicode
files: utf8_encoder_errors.patch
keywords: patch
messages: 251845
nosy: ezio.melotti, haypo, naoki, serhiy.storchaka
priority: normal
severity: normal
status: open
title: Optimize UTF-8 encoder with error handlers
type: performance
versions: Python 3.6
Added file: http://bugs.python.org/file40619/utf8_encoder_errors.patch

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue25267>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to