Steven D'Aprano <steve+pyt...@pearwood.info> added the comment: Here's my implementation:
from unicodedata import name from unicodedata import lookup as _lookup from fnmatch import translate from re import compile, I _NAMES = None def getnames(): global _NAMES if _NAMES is None: _NAMES = [] for i in range(0x110000): s = name(chr(i), '') if s: _NAMES.append(s) return _NAMES def lookup(name_or_glob): if any(c in name_or_glob for c in '*?['): match = compile(translate(name_or_glob), flags=I).match return [name for name in getnames() if match(name)] else: return _lookup(name_or_glob) The major limitation of my implementation is that it doesn't match name aliases or sequences. http://www.unicode.org/Public/11.0.0/ucd/NameAliases.txt http://www.unicode.org/Public/11.0.0/ucd/NamedSequences.txt For example: lookup('TAMIL SYLLABLE TAA?') # NamedSequence ought to return ['தா'] but doesn't. Parts of the Unicode documentation uses the convention that canonical names are in UPPERCASE, aliases are lowercase, and sequences are in Mixed Case. and I think that we should follow that convention: http://www.unicode.org/charts/aboutcharindex.html That makes it easy to see what is the canonical name and what isn't. ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue35549> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com