Scott David Daniels wrote: >> >>> int(u"\N{DEVANAGARI DIGIT SEVEN}") >> 7 > > OK, That much I have handled. I am fiddling with direct-to-number > conversions and wondering about cases like > >>> int(u"\N{DEVANAGARI DIGIT SEVEN}" + XXX > + u"\N{DEVANAGARI DIGIT SEVEN}")
int() passes NULL as error mode, equalling strict. So if you get an unencodable character, you get the UnicodeError. > I don't really understand how the "ignore" or "something_else" > cases get caused by python source [where they come from]. Are they > only there for C-program access? Neither, nor. This code is dead. >> In the "ignore" case, no output is produced at all, for the unencodable >> character; this is the same way that '?' would be treated (it is >> also unencodable). > > If I understand you correctly -- I can consider the digit stream to stop > as soon as I hit a non-digit (except for handling bases 11-36). No. In "ignore" mode, a codec doesn't stop at the unencodable character. Instead, it skips it, continuing with the next character. I mistakenly said that this would happen to '?' (question mark) also; this is incorrect: PyUnicode_EncodeDecimal copies all Latin-1 characters to the output, latin-1-encoded. So '?' would appear in the output, even in "ignore" mode. Handling of bases is not done in the function at all. Instead, the callers of PyUnicode_EncodeDecimal will deal with number formats (base, prefix, exponent syntax, etc.) They will assume ASCII bytes. Regards, Martin -- http://mail.python.org/mailman/listinfo/python-list