Matthew Barnett <pyt...@mrabarnett.plus.com> added the comment: For what it's worth, I've had idea about string storage, roughly based on how *nix stores data on disk.
If a string is small, point to a block of codepoints. If a string is medium-sized, point to a block of pointers to codepoint blocks. If a string is large, point to a block of pointers to pointer blocks. This means that a large string doesn't need a single large allocation. The level of indirection can be increased as necessary. For simplicity, all codepoint blocks contain the same number of codepoints, except the final codepoint block, which may contain fewer. A codepoint block may use the minimum width necessary (1, 2 or 4 bytes) to store all of its codepoints. This means that there are no surrogates and that different sections of the string can be stored in different widths to reduce memory usage. ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue12729> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com