Re: Grapheme clusters, a.k.a.real characters

Chris Angelico Fri, 14 Jul 2017 01:33:33 -0700

On Fri, Jul 14, 2017 at 6:15 PM, Marko Rauhamaa <[email protected]> wrote:
> Chris Angelico <[email protected]>:
>
>> On Fri, Jul 14, 2017 at 4:30 PM, Marko Rauhamaa <[email protected]> wrote:
>>> When people use Unicode, they are expecting to be able to deal in real
>>> characters. I would expect:
>>>
>>>    len(text)               to give me the length in characters
>>>    text[-1]                to evaluate to the last character
>>>    re.match("a.c", text)   to match a character between a and c
>>>
>>> So the question is, should we have a third type for text. Or should the
>>> semantics of strings be changed to be based on characters?
>>
>> What is the length of a string? How often do you actually care about
>> the number of grapheme clusters - and not, for example, about the
>> pixel width?
>
> A good question. I have in the past argued that the string should be a
> special data type for the specialist text processing needs.
>
> However, I happen to have fooled around with a character-graphics based
> game in recent days, and even professionally, I use character-based
> alignment quite often. Consider, for example, a Python source code
> editor where you want to limit the length of the line based on the
> number of characters more typically than based on the number of pixels.
>
> Furthermore, you only dismissed my question about
>
>    len(text)
>
> What about
>
>    text[-1]
>    re.match("a.c", text)


The considerations and concerns in the second half of my paragraph -
the bit you didn't quote - directly address these two.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Grapheme clusters, a.k.a.real characters

Reply via email to