Bugs item #1452697, was opened at 2006-03-18 05:07 Message generated for change (Comment added) made by ocean-city You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1452697&group_id=5470
Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Unicode Group: Python 2.4 >Status: Deleted Resolution: None Priority: 7 Submitted By: ocean-city (ocean-city) Assigned to: M.-A. Lemburg (lemburg) Summary: broken string on mbcs Initial Comment: Hello. I noticed unicode conversion from mbcs was sometimes broken. This happened when I used codecs.open("foo", "r", "mbcs") as iterator. # It's OK if I use "shift_jis" or "cp932". I'll attach the script and text file to reproduce the problem. I'm using Win2000SP4(Japanese). Thank you. ---------------------------------------------------------------------- >Comment By: ocean-city (ocean-city) Date: 2006-03-22 16:14 Message: Logged In: YES user_id=1200846 I'll move this to "Patches" tracker. ---------------------------------------------------------------------- Comment By: ocean-city (ocean-city) Date: 2006-03-19 11:08 Message: Logged In: YES user_id=1200846 I updated the patch. Compared to version1... * [bug] consumed should be 0 if the length of string is 0 * [enhancement] use IsDBCSLeadByte to detect incomplete buffer termination instead of trying MultiByteToWideChar with MB_ERR_INVALID_CHARS. This could cause performance hit if string contains invalid chars in early part. ---------------------------------------------------------------------- Comment By: ocean-city (ocean-city) Date: 2006-03-18 13:17 Message: Logged In: YES user_id=1200846 Probably this patch will fix the problem. (for release24-maint) Cause: MultiByteToWideChar returns non zero value for incomplete multibyte character. (ex: if buffer terminates with leading byte, MultiByteToWideChar returns 1 (not 0) for it. It should return 0, otherwise result will be broken. Solution: Set flag MB_ERR_INVALID_CHARS to avoid incorrect handling of trailing incomplete multibyte part. If error occurs, removes the trailing byte and tries again. Caution: I have not tested this so intensibly. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1452697&group_id=5470 _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com