[issue12737] str.title() is overzealous by upcasing combining marks inappropriately

Martin v . Löwis Sat, 01 Oct 2011 03:59:56 -0700

Martin v. Löwis <[email protected]> added the comment:

>  * Word characters are Alphabetic + Mn+Mc+Me + Nd + Pc.


Where did you get that definition from? UTS#18 defines
"<word_character>", which is Alphabetic + U+200C + U+200D
(i.e. not including marks, but including those

> I think you are looking for here are Word characters without 
> Nd + Pc, so just Alphabetic + Mn+Mc+Me.  
> 
> Is that right?

With your definition of "Word character" above, yes, that's right.
Marks won't start a word, though.

As for terminology: I think the documentation should continue to
speak about "words" and "letters", and then define what is meant
in this context. It's not that the Unicode consortium invented
the term "letter", so we should use it more liberally than just
referring to the L* categories.

----------
title: str.title() is overzealous by upcasing combining marks inappropriately 
-> str.title() is overzealous by upcasing combining marks inappropriately

_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue12737>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue12737] str.title() is overzealous by upcasing combining marks inappropriately

Reply via email to