Steven D'Aprano added the comment: http://www.unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries
talks about *grapheme clusters*, not "graphemes" alone, and it seems clear to me that they are language dependent. For example, it says: The Unicode Standard provides default algorithms for determining grapheme cluster boundaries, with two variants: legacy grapheme clusters and extended grapheme clusters. The most appropriate variant depends on the language and operation involved. ... These algorithms can be adapted to produce tailored grapheme clusters for specific locales... Nevertheless, even just a basic API to either the *legacy grapheme cluster* or the *extended grapheme cluster* algorithms would be a good start. Can I suggest that the unicodedata module might be the right place for it? And thank you for volunteering to do the work on this! ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue30717> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com