[issue13612] xml.etree.ElementTree says unknown encoding of a regular encoding

2013-08-04 Thread Roundup Robot
Roundup Robot added the comment: New changeset b3efc140d8a6 by Eli Bendersky in branch '2.7': Issue #13612: Fix a buffer overflow in case of a multi-byte encoding. http://hg.python.org/cpython/rev/b3efc140d8a6 -- ___ Python tracker

[issue13612] xml.etree.ElementTree says unknown encoding of a regular encoding

2013-08-04 Thread Eli Bendersky
Eli Bendersky added the comment: Thanks, Serhiy. -- resolution: -> fixed stage: patch review -> committed/rejected status: open -> closed ___ Python tracker ___

[issue13612] xml.etree.ElementTree says unknown encoding of a regular encoding

2013-08-04 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: Here is a patch for 2.7. -- Added file: http://bugs.python.org/file31150/expat_buffer_overflow-2.7.patch ___ Python tracker ___ __

[issue13612] xml.etree.ElementTree says unknown encoding of a regular encoding

2013-08-01 Thread Eli Bendersky
Eli Bendersky added the comment: Serhiy, do you want to backport the buffer overflow fix to 2.7? -- ___ Python tracker ___ ___ Python-

[issue13612] xml.etree.ElementTree says unknown encoding of a regular encoding

2013-05-25 Thread Eli Bendersky
Eli Bendersky added the comment: Oh, I didn't notice that your patches had duplication in the tests. Fixed now. I'll wait to see what unfolds for the Misc/NEWS discussion on python-committers. -- ___ Python tracker

[issue13612] xml.etree.ElementTree says unknown encoding of a regular encoding

2013-05-25 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: Tests were duplicated. There is no Misc/NEWS entry. I think a buffer overflow is critical enough for backporting. -- ___ Python tracker ___ _

[issue13612] xml.etree.ElementTree says unknown encoding of a regular encoding

2013-05-25 Thread Eli Bendersky
Eli Bendersky added the comment: A few notes: 1. If by C API version you mean PyExpat_CAPI_MAGIC, I'm not sure what difference that makes. It has never been updated and it's also being used only in _elementtree. Since the latter is statically compiled against pyexpat, I don't see a reason to

[issue13612] xml.etree.ElementTree says unknown encoding of a regular encoding

2013-05-25 Thread Roundup Robot
Roundup Robot added the comment: New changeset f7b47fb30169 by Eli Bendersky in branch '3.3': Issue #13612: handle unknown encodings without a buffer overflow. http://hg.python.org/cpython/rev/f7b47fb30169 New changeset 47e719b11c46 by Eli Bendersky in branch 'default': Issue #13612: handle unkn

[issue13612] xml.etree.ElementTree says unknown encoding of a regular encoding

2013-05-23 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: LGTM. But why not just inline expat_unknown_encoding_handler()? For 2.7 we perhaps should add SetStartDoctypeDeclHandler and SetEncoding in the exported C API. Shouldn't we update the C API version (for this issue and for issue16986)? -- __

[issue13612] xml.etree.ElementTree says unknown encoding of a regular encoding

2013-05-23 Thread Eli Bendersky
Eli Bendersky added the comment: How about this patch (not tested it too much - just as a proof of concept). We're pretty free in the C API exported by pyexpat through a capsule to _elementtree, so we can also add a default handler there. This API already has some general utilities like ErrorS

[issue13612] xml.etree.ElementTree says unknown encoding of a regular encoding

2013-05-22 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: May be. But it needs more work for building. It is simpler just duplicate a code (the function is small enough) and add comments: /* The code is duplicated as xx() in xxx.c. Keep both functions synchronized. */ -- _

[issue13612] xml.etree.ElementTree says unknown encoding of a regular encoding

2013-05-22 Thread Eli Bendersky
Eli Bendersky added the comment: Serhiy, would it make sense to share the code somewhere instead of duplicating it? -- ___ Python tracker ___ ___

[issue13612] xml.etree.ElementTree says unknown encoding of a regular encoding

2013-05-22 Thread Serhiy Storchaka
Changes by Serhiy Storchaka : -- dependencies: -ElementTree incorrectly parses strings with declared encoding not UTF-8 ___ Python tracker ___ _

[issue13612] xml.etree.ElementTree says unknown encoding of a regular encoding

2013-05-22 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: Here is an updated patch. PyUnknownEncodingHandler() and expat_unknown_encoding_handler() are synchronized. Added tests. -- Added file: http://bugs.python.org/file30342/expat_unknown_encoding_handler_2.patch ___ P

[issue13612] xml.etree.ElementTree says unknown encoding of a regular encoding

2013-05-22 Thread Eli Bendersky
Eli Bendersky added the comment: Looked at Serhiy's patch here too: LGTM with a unit test :) -- ___ Python tracker ___ ___ Python-bugs

[issue13612] xml.etree.ElementTree says unknown encoding of a regular encoding

2013-05-22 Thread Eli Bendersky
Eli Bendersky added the comment: > For unit tests we first should fix issue16986. I did another round of code review on issue 16986 now. -- ___ Python tracker ___ __

[issue13612] xml.etree.ElementTree says unknown encoding of a regular encoding

2013-05-22 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: For unit tests we first should fix issue16986. -- ___ Python tracker ___ ___ Python-bugs-list mail

[issue13612] xml.etree.ElementTree says unknown encoding of a regular encoding

2013-05-22 Thread Serhiy Storchaka
Changes by Serhiy Storchaka : -- dependencies: +ElementTree incorrectly parses strings with declared encoding not UTF-8 ___ Python tracker ___ __

[issue13612] xml.etree.ElementTree says unknown encoding of a regular encoding

2013-05-21 Thread Amaury Forgeot d'Arc
Amaury Forgeot d'Arc added the comment: The patch goes in the right direction: consistently reject non-8bit encodings which the current implementation does not support. - please add a unit test - remove usage of PyUnicodeObject and the "Stupid to access directly" comment, they are outdated. Re

[issue13612] xml.etree.ElementTree says unknown encoding of a regular encoding

2013-05-20 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: I can propose only raise a specialized exception instead of general xml.etree.ElementTree.ParseError. Here is a patch. It also fixes a buffer overread bug mentioned by Amaury. -- components: +Extension Modules, Unicode, XML -Library (Lib) keywords: +

[issue13612] xml.etree.ElementTree says unknown encoding of a regular encoding

2013-05-20 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: PyUnknownEncodingHandler gets an encoding name and fills a special XML_Encoding structure which expat parser uses for decoding. Currently PyUnknownEncodingHandler works only with 8-bit encodings and I don't see an efficient method how extent it to handle gen

[issue13612] xml.etree.ElementTree says unknown encoding of a regular encoding

2013-05-20 Thread Terry J. Reedy
Terry J. Reedy added the comment: 3.3 shifted the wide-build problem to all builds ;-). I now get File "C:\Python\mypy\tem.py", line 4, in xmlet.fromstring(s) File "C:...33\lib\xml\etree\ElementTree.py", line 1356, in XML parser.feed(text) File "", line None xml.etree.ElementTree.

[issue13612] xml.etree.ElementTree says unknown encoding of a regular encoding

2013-05-20 Thread Amaury Forgeot d'Arc
Amaury Forgeot d'Arc added the comment: encoding="GBK" causes a buffer overflow in PyUnknownEncodingHandler, because the result of PyUnicode_Decode() is only 192 characters long. Exact behavior is not defined... -- ___ Python tracker

[issue13612] xml.etree.ElementTree says unknown encoding of a regular encoding

2013-05-20 Thread Eli Bendersky
Eli Bendersky added the comment: In 3.3+ there's no distinction between wide and narrow builds. Does anyone know how this should be affected? [I don't know much about unicode and encodings, unfortunately] -- ___ Python tracker

[issue13612] xml.etree.ElementTree says unknown encoding of a regular encoding

2012-07-21 Thread Florent Xicluna
Changes by Florent Xicluna : -- nosy: +eli.bendersky ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: http://mai

[issue13612] xml.etree.ElementTree says unknown encoding of a regular encoding

2011-12-19 Thread Antoine Pitrou
Changes by Antoine Pitrou : -- nosy: +flox, haypo versions: +Python 3.2, Python 3.3 ___ Python tracker ___ ___ Python-bugs-list mailin

[issue13612] xml.etree.ElementTree says unknown encoding of a regular encoding

2011-12-16 Thread Terry J. Reedy
Terry J. Reedy added the comment: 3.2, Win7 (a narrow build) it indeed works and returns -- nosy: +terry.reedy versions: +Python 2.7 -3rd party, Python 2.6 ___ Python tracker _

[issue13612] xml.etree.ElementTree says unknown encoding of a regular encoding

2011-12-16 Thread Amaury Forgeot d'Arc
Amaury Forgeot d'Arc added the comment: Actually, this fails on 2.6 and 2.7 on wide unicode builds, and passes with narrow unicode builds (on my 64bit Linux box). In pyexpat.c, PyUnknownEncodingHandler accesses 256 characters of a unicode buffer, without checking its length... which happens t

[issue13612] xml.etree.ElementTree says unknown encoding of a regular encoding

2011-12-16 Thread Dongying Zhang
Changes by Dongying Zhang : -- type: -> behavior ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.p

[issue13612] xml.etree.ElementTree says unknown encoding of a regular encoding

2011-12-16 Thread Dongying Zhang
Changes by Dongying Zhang : -- versions: +3rd party ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail

[issue13612] xml.etree.ElementTree says unknown encoding of a regular encoding

2011-12-16 Thread Dongying Zhang
New submission from Dongying Zhang : I've been trying to parse xml string using python, codes following: #-*- coding: utf-8 -*- import xml.etree.ElementTree as xmlet s = '' xmlet.fromstring(s) Then: $ python2.6 test.py or: $ pypy test.py Traceback message came out like