[issue12737] str.title() is overzealous by upcasing combining marks inappropriately

Martin v . Löwis Sun, 18 Sep 2011 01:46:02 -0700

Martin v. Löwis <mar...@v.loewis.de> added the comment:

Tom: it's intentional that .title() doesn't use traditional word break 
algorithms. In 2.x, "foo3bar".title() is "Foo3Bar", i.e. the 3 counts as a word 
end. So neither UTS#18 \w nor UAX#29 apply. So in UTS#18 terminology, .title() 
matches more closes \alpha+, despite UTS#18 saying that this shouldn't be used 
for word-breaking.


It's not clear to me how UTS#18 defines \alpha. On the one hand, they say that 
marks should be included, OTOH they refer to the Alphabetic derived category 
which doesn't include marks, except for the few that have been included in 
Other_Alphatetic.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue12737>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue12737] str.title() is overzealous by upcasing combining marks inappropriately

Reply via email to