Re: Unicode script

Terry Reedy Thu, 15 Dec 2016 14:00:46 -0800

On 12/15/2016 1:06 PM, MRAB wrote:

On 2016-12-15 16:53, Steve D'Aprano wrote:

Suppose I have a Unicode character, and I want to determine the script or
scripts it belongs to.


For example:

U+0033 DIGIT THREE "3" belongs to the script "COMMON";
U+0061 LATIN SMALL LETTER A "a" belongs to the script "LATIN";
U+03BE GREEK SMALL LETTER XI "ξ" belongs to the script "GREEK".


Is this information available from Python?


More about Unicode scripts:

http://www.unicode.org/reports/tr24/
http://www.unicode.org/Public/UCD/latest/ucd/Scripts.txt
http://www.unicode.org/Public/UCD/latest/ucd/ScriptExtensions.txt

Interestingly, there's issue 6331 "Add unicode script info to the
unicode database". Looks like it didn't make it into Python 3.6.


https://bugs.python.org/issue6331

Opened in 2009 with patch and 2 revisions for 2.x. At least the Pythoncode needs to be updated.

Approved in principle by Martin, then unicodedata curator, but no longeractive. Neither, very much, are the other 2 listed in the Expert's index.

From what I could see, both the Python API (there is no doc patch yet)and internal implementation need more work. If I were to get involved,I would look at the APIs of PyICU (see Eryk Sun's post) and theunicodescript module on PyPI (mention by Pander Musubi, on the issue).


--
Terry Jan Reedy


--
https://mail.python.org/mailman/listinfo/python-list

Re: Unicode script

Reply via email to