On Sat, Jan 21, 2017 at 11:51 AM, Pete Forman <petef4+use...@gmail.com> wrote: > MRAB <pyt...@mrabarnett.plus.com> writes: > >> As someone who has written an extension, I can tell you that I much >> prefer dealing with a fixed number of bytes per codepoint than a >> variable number of bytes per codepoint, especially as I'm also >> supporting earlier versions of Python where that was the case. > > At the risk of sounding harsh, if supporting variable bytes per > codepoint is a pain you should roll with it for the greater good of > supporting users.
That hasn't been demonstrated, though. There's plenty of evidence regarding cache usage that shows that direct indexing is incredibly beneficial on large strings. What are the benefits of variable-sized encodings? AFAIK, the only real benefit is that you can use less memory for strings that contain predominantly ASCII but a small number of astral characters (plus *maybe* a faster encode-to-UTF-8; you wouldn't get a faster decode-from-UTF-8, because you still need to check that the byte sequence is valid). Can you show a use-case that would be materially improved by UTF-8? ChrisA -- https://mail.python.org/mailman/listinfo/python-list