On Wed, May 13, 2015 at 11:23 AM, Steven D'Aprano <steve+comp.lang.pyt...@pearwood.info> wrote: > On Wed, 13 May 2015 03:26 am, Chris Angelico wrote: > >> back when MicroPython was debating the implementation of Unicode >> strings, there was a lengthy discussion on python-dev about whether >> it's okay for string subscripting to be O(n) instead of O(1), and the >> final decision was that yes, that's an implementation detail. (UTF-8 >> internal string representation, so iterating over a string would still >> yield characters in overall O(n), but iterating up to the string's >> length and subscripting for each character would become O(n*n) on >> uPy.) > > o_O > > Got a link to that? I must have missed it.
Linking to python-dev is a bit fiddly and/or unstable due to URL changes, plus the discussion there was pretty long and rambly. Probably the best I can do is point you to the tracker issue where I opened the original question: https://github.com/micropython/micropython/issues/657 (The biggest issue was that uPy was, at the time, fundamentally incompatible with Python's stipulated semantics - imagine all the problems of a narrow build of CPython <3.3, only more frequent because it's actually UTF-8.) It was finally decided, I think, that Python-the-language didn't actually mandate O(1) indexing, meaning that a microcontroller (on which strings aren't going to be gigantic anyway) is welcome to use a UTF-8 internal representation, with "Hello, world"[4] required to scan across and count non-continuation bytes to find the right character. Whether or not uPy actually ended up accepting the requirements of proper Unicode support I don't know, as I'm no longer involved with the project. ChrisA -- https://mail.python.org/mailman/listinfo/python-list