New submission from STINNER Victor: (Follow up of issue #20538 and #20571.) Attached patch implements incremental decoders for multibyte code pages (on Windows), especially for CP_UTF8 aka "cp65001" in Python.
Code pages 932, 936, 949, 950 and 1361 already have an incremental decoder since: --- changeset: 38817:549c547700af branch: legacy-trunk user: Martin v. Löwis <mar...@v.loewis.de> date: Wed Jun 14 05:21:04 2006 +0000 files: Doc/api/concrete.tex Include/unicodeobject.h Lib/encodings/mbcs.py Misc/NEWS Modules/_codecsmodule.c Objects/unicodeobject.c description: Patch #1455898: Incremental mode for "mbcs" codec. --- Python currently uses IsDBCSLeadByteEx(): http://msdn.microsoft.com/en-us/library/windows/desktop/dd318667%28v=vs.85%29.aspx And CharPrevA(): http://msdn.microsoft.com/en-us/library/windows/desktop/ms647471%28v=vs.85%29.aspx But IsDBCSLeadByteEx() only supports code pages 932, 936, 949, 950 and 1361. Python supports the code page 65001 (codec "cp65001") since Python 3.3. New tests on incremental decoders were added in Python 3.4: I addedd a skip for cp65001 since it was not supported (#20571). This issue implements the incremental decoder and so removes the skip. I prefer to wait for Python 3.5 (not rush for add this new feature after 3.4 beta 3). cp65001 is mostly used for output (sys.stdout/sys.stderr) on Windows, not for input. ---------- files: incremental_cp_utf8.patch keywords: patch messages: 210759 nosy: haypo, larry, loewis, serhiy.storchaka priority: normal severity: normal status: open title: Implement incremental decoder for cp65001 type: enhancement versions: Python 3.5 Added file: http://bugs.python.org/file34008/incremental_cp_utf8.patch _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue20574> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com