New submission from Lin Tian: utf8_toUtf8(const ENCODING *UNUSED_P(enc), const char **fromP, const char *fromLim, char **toP, const char *toLim) { char *to; const char *from; const char *fromLimInitial = fromLim;
/* Avoid copying partial characters. */ align_limit_to_full_utf8_characters(*fromP, &fromLim); for (to = *toP, from = *fromP; (from < fromLim) && (to < toLim); from++, to++) *to = *from; *fromP = from; *toP = to; if (fromLim < fromLimInitial) return XML_CONVERT_INPUT_INCOMPLETE; else if ((to == toLim) && (from < fromLim)) // <===== Bug is here. In case (to == toLim), it's possible that // from is still pointing to partial character. For example, // a character with 3 bytes (A, B, C) and form is pointing to C. // It means only A and B is copied to output buffer. Next // scanning will start with C which could be considered as invalid // byte and got dropped. After this, only "AB" is kept in memory // and thus it will lead to invalid continuation byte. return XML_CONVERT_OUTPUT_EXHAUSTED; else return XML_CONVERT_COMPLETED; } ---------- components: Library (Lib) messages: 300043 nosy: Lin Tian priority: normal severity: normal status: open title: expat: utf8_toUtf8 cannot properly handle exhausting buffer type: behavior versions: Python 3.6, Python 3.7 _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue31170> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com