On Thu, Jun 22, 2017 at 11:33 PM, Steve D'Aprano <steve+pyt...@pearwood.info> wrote: > and besides some Unicode code points are not > characters at all). > > http://www.unicode.org/faq/private_use.html#noncharacters
AIUI, "noncharacters" are like the IEEE floating point value "not-a-number". If you ask for the type of it in Python, it's "float", which is a numeric type. (It's funnier in JavaScript, where 'typeof NaN' is "number".) They're completely well-defined in terms of pretty much everything you would use a string for, the sole exception being displaying it to a human (at which point a boatload of other complexities kick in too, eg directionality (LTR/RTL), combining characters, fonts lacking certain glyphs, text wrapping, etc). So a character count should normally *include* any noncharacters in the string. But honestly, I don't know where a character count is the right choice of measurement. If you're limiting the size of user input, you probably want to count codepoints (so people don't just put five billion combining characters onto a single base), and if you're going to count combined characters, you often want to be measuring in glyphs (or maybe pixels) so it actually corresponds to the displayed text. Got any examples of where you want to count characters? And if so, do those situations govern the definition of "character"? ChrisA -- https://mail.python.org/mailman/listinfo/python-list