On Saturday, September 3, 2016 at 7:55:48 AM UTC-4, Veek. M wrote: > https://mail.python.org/pipermail//python-ideas/2014-October/029630.htm > > Wanted to know if the above link idea, had been implemented and if > there's a module that accepts a pattern like 'cap' and give you all the > instances of unicode 'CAP' characters. > ⋂ \bigcap > ⊓ \sqcap > ∩ \cap > ♑ \capricornus > ⪸ \succapprox > ⪷ \precapprox > > (above's from tex) > > I found two useful modules in this regard: unicode_tex, unicodedata > but unicodedata is a builtin which does not do globs, regexs - so it's > kind of limiting in nature. > > Would be nice if you could search html/xml character entity references > as well.
The unicodedata module has all the information you need for searching Unicode character names. While it doesn't provide regex or globs, it's all in-memory, so it's not bad for just iterating over the characters and finding what you need. But, 'CAP' appears in 'CAPITAL', which gives more than 1800 matches: >>> for c in range(32, 0x110000): ... try: ... name = unicodedata.name(chr(c)) ... except ValueError: ... continue ... if 'CAP' in name: ... print(c, name) ... 65 LATIN CAPITAL LETTER A 66 LATIN CAPITAL LETTER B .. .. many other lines, mostly with CAPITAL in them .. .. 917593 TAG LATIN CAPITAL LETTER Y 917594 TAG LATIN CAPITAL LETTER Z >>> These were the character names without "CAPITAL": 8419 COMBINING ENCLOSING KEYCAP 8851 SQUARE CAP 9232 SYMBOL FOR DATA LINK ESCAPE 9243 SYMBOL FOR ESCAPE 9809 CAPRICORN 11839 CAPITULUM 41657 YI SYLLABLE CAP 52290 HANGUL SYLLABLE CAP 66003 PHAISTOS DISC SIGN CAPTIVE 119050 MUSICAL SYMBOL DA CAPO 127750 CITYSCAPE AT DUSK 127891 GRADUATION CAP 127956 SNOW CAPPED MOUNTAIN 127961 CITYSCAPE 128287 KEYCAP TEN 128846 ALCHEMICAL SYMBOL FOR CAPUT MORTUUM --Ned. -- https://mail.python.org/mailman/listinfo/python-list