On Sat, Apr 10, 2021 at 12:15 AM Paul Bryan <[email protected]> wrote: > > This sounds more like a Unicode thing than a generic string thing. And, in > Uncode, Greek characters are included in multiple groupings. Searching for > "Theta" to see what we get: > > Greek and Coptic: > U+0398 GREEK CAPITAL LETTER THETA > U+03B8 GREEK SMALL LETTER THETA > U+03D1 GREEK THETA SYMBOL > U+03F4 GREEK CAPITAL THETA SYMBOL > > Phonetic Extensions Supplement: > U+1DBF MODIFIER LETTER SMALL THETA > > Mathematical Alphanumeric Symbols: > U+1D6AF MATHEMATICAL BOLD CAPITAL THETA > U+1D6B9 MATHEMATICAL BOLD CAPITAL THETA SYMBOL > U+1D6C9 MATHEMATICAL BOLD SMALL THETA > (... 17 more Thetas in this group! ...) > > If you were to pick a definitive set of Greek characters for your use case, > would it be in the Mathematical Alphanumeric Symbols category? Would others' > expected use of Greek characters match yours, or would it need to be > inclusive of all Greek characters across groupings? > > I'm beginning to sense a metal container containing wriggly things... >
But I think you've also nailed the correct solution. Python comes with [1] a unicodedata module, which would be the best way to define these sorts of sets. It's a tad messy to try to gather the correct elements though, so maybe the best way to do this would be a unicodedata.search() function that returns a string of all characters with a particular string in their names, or something like that. ChrisA [1] technically, CPython and many other implementations come with, but there are some (eg uPy) that don't _______________________________________________ Python-ideas mailing list -- [email protected] To unsubscribe send an email to [email protected] https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/[email protected]/message/5MRAFMNZQ27DDAA7ZRD2E55OAFKWD734/ Code of Conduct: http://python.org/psf/codeofconduct/
