New submission from Dmitry Chichkov <dchich...@gmail.com>:

The namespace_separator parameter is hard coded in the cElementTree.XMLParser 
class disallowing the option of ignoring XML Namespaces with cElementTree 
library.

Here's the code example:
 from xml.etree.cElementTree import iterparse
 from StringIO import StringIO
 xml = """<root xmlns="http://www.very_long_url.com";><child/></root>"""
 for event, elem in iterparse(StringIO(xml)): print event, elem

It produces:
 end <Element '{http://www.very_long_url.com}child' at 0xb7ddfa58>
 end <Element '{http://www.very_long_url.com}root' at 0xb7ddfa40> 

In the current implementation local tags get forcibly concatenated with URIs 
often resulting in the ugly code on the user's side and performance degradation 
(at least due to extra concatenations and extra lengthy compare operations in 
the elements matching code).

Internally cElementTree uses EXPAT parser, which is doing namespace processing 
only optionally, enabled by providing a value for namespace_separator argument. 
This argument is hard-coded in the cElementTree: 
 self->parser = EXPAT(ParserCreate_MM)(encoding, &memory_handler, "}");

Well, attached is a patch exposing this parameter in the 
cElementTree.XMLParser() arguments. This parameter is optional and the default 
behavior should be unchanged.  Here's the test code:

import cElementTree

x = """<root xmlns="http://www.very_long_url.com";><child>text</child></root>"""

parser = cElementTree.XMLParser()
parser.feed(x)
elem = parser.close()
print elem

parser = cElementTree.XMLParser(namespace_separator="}")
parser.feed(x)
elem = parser.close()
print elem

parser = cElementTree.XMLParser(namespace_separator=None)
parser.feed(x)
elem = parser.close()
print elem

The resulting output:
<Element '{http://www.very_long_url.com}root' at 0xb7e885f0>
<Element '{http://www.very_long_url.com}root' at 0xb7e88608>
<Element 'root' at 0xb7e88458>

----------
components: Library (Lib)
messages: 104671
nosy: dmtr
priority: normal
severity: normal
status: open
title: Hardcoded namespace_separator in the cElementTree.XMLParser
type: performance
versions: Python 2.5, Python 2.6, Python 2.7

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue8583>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to