[issue25190] Define StringIO seek offset as code point offset

Martin Panter Sun, 20 Sep 2015 23:52:40 -0700

Martin Panter added the comment:

I see the _pyio implementation wraps BytesIO with UTF-8 encoding. Perhaps it 
would be okay to change to UTF-32 encoding (a fixed-length Unicode encoding). 
That would use more memory, but the C implementation seems to use a Py_UCS4 
buffer already. Then you could reimplement seek(), tell(), and truncate() by 
detaching and rebuilding the TextIOWrapper over the top. Not super efficient, 
but perhaps that does not matter for the _pyio implementation.


The fact that it is so hard to do this (random write access to a large Unicode 
buffer) in native Python could be another argument to support this in the 
default StringIO implementation :)

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue25190>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue25190] Define StringIO seek offset as code point offset

Reply via email to