Bugs item #904474, was opened at 2004-02-25 11:30 Message generated for change (Settings changed) made by nnorwitz You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=904474&group_id=5470
Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Unicode Group: None >Status: Closed >Resolution: Invalid Priority: 5 Submitted By: Ron Rother (rrother) Assigned to: Nobody/Anonymous (nobody) Summary: File read of Chinese utf-16-le treats upper byte 1A as EOF Initial Comment: Any utf-16-le Chinese character with 1A as the most significant byte causes remainder of file to be ignored. code extract: (utf16_encoder, utf16_decoder, utf16_reader, utf16_writer) = codecs.lookup("utf-16-le") ifile = utf16_reader(open(sys.argv[1],"r")) t=ifile.read() When the Chinese character 1A 5C (尚) is encoundered, everthing from the 5C is discarded. These 3 lines: English="You have not selected any books!" Context=1,[MsgBox "You have not selected any books!"] Chinese(Simplified)="尚未选择任何书卷!" are input as: English="You have not selected any books!" Context=1,[MsgBox "You have not selected any books!"] Chinese(Simplified)=" ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2005-10-02 18:19 Message: Logged In: YES user_id=33168 MAL, this seems to come up from time to time. Perhaps we should update the doc for open()? If it's already documented, could we make it clearer? Then we should be able to close this bug. I think I saw another bug recently that was similar to this one. ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2004-02-25 14:53 Message: Logged In: YES user_id=38388 I believe there is a misconception here: the open(..., "r") will cause the file to be opened in C lib's text mode. Since UTF-16 is binary data, this will lead to problems with line breaking and file handling in general. You should try: import codecs ifile = codecs.open(filename, 'rb', encoding='utf-16-le') ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=904474&group_id=5470 _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com