On Fri, Jul 14, 2017 at 6:15 PM, Marko Rauhamaa <ma...@pacujo.net> wrote: > Chris Angelico <ros...@gmail.com>: > >> On Fri, Jul 14, 2017 at 4:30 PM, Marko Rauhamaa <ma...@pacujo.net> wrote: >>> When people use Unicode, they are expecting to be able to deal in real >>> characters. I would expect: >>> >>> len(text) to give me the length in characters >>> text[-1] to evaluate to the last character >>> re.match("a.c", text) to match a character between a and c >>> >>> So the question is, should we have a third type for text. Or should the >>> semantics of strings be changed to be based on characters? >> >> What is the length of a string? How often do you actually care about >> the number of grapheme clusters - and not, for example, about the >> pixel width? > > A good question. I have in the past argued that the string should be a > special data type for the specialist text processing needs. > > However, I happen to have fooled around with a character-graphics based > game in recent days, and even professionally, I use character-based > alignment quite often. Consider, for example, a Python source code > editor where you want to limit the length of the line based on the > number of characters more typically than based on the number of pixels. > > Furthermore, you only dismissed my question about > > len(text) > > What about > > text[-1] > re.match("a.c", text)
The considerations and concerns in the second half of my paragraph - the bit you didn't quote - directly address these two. ChrisA -- https://mail.python.org/mailman/listinfo/python-list