Hello,

I wrote some code to transform a raw XML string into a domish.Element, and I keep on getting char encoding/decoding errors :

   class __RawXmlToElement(object):
def __call__(self, s):
           self.result = None
           def onStart(el):
               self.result = el
           def onEnd():
               pass
           def onElement(el):
               self.result.addChild(el)
parser = domish.elementStream()
           parser.DocumentStartEvent = onStart
           parser.ElementEvent = onElement
           parser.DocumentEndEvent = onEnd
           tmp = domish.Element(("", "s"))
           tmp.addRawXml(s)
           parser.parse(tmp.toXml())
return self.result.firstChildElement()

   rawXmlToElement = __RawXmlToElement()


Here's a test raw XML string :

    >>> u"<t>reçu</t>"
   u'<t>re\xe7u</t>'

    >>> u"<t>reçu</t>".encode("utf-8")
   '<t>re\xc3\xa7u</t>'

    >>> "<t>reçu</t>"
   '<t>re\xc3\xa7u</t>'


As you can see my system encodes strings in UTF-8, I tried the following but I
keep on getting errors :

    >>> rawXmlToElement("<t>reçu</t>")
   raw xml adder error : 'ascii' codec can't decode byte 0xc3 in
   position 5: ordinal not in range(128)

    >>> rawXmlToElement(u"<t>reçu</t>")
   parser error : 'ascii' codec can't encode character u'\xe7' in
   position 8: ordinal not in range(128)
   Traceback (most recent call last):
     File "<stdin>", line 1, in <module>
     File "<stdin>", line 26, in __call__
   AttributeError: 'NoneType' object has no attribute 'firstChildElement'

    >>> rawXmlToElement(unicode("<t>reçu</t>", "utf-8"))
   parser error : 'ascii' codec can't encode character u'\xe7' in
   position 8: ordinal not in range(128)
   Traceback (most recent call last):
     File "<stdin>", line 1, in <module>
     File "<stdin>", line 26, in __call__
   AttributeError: 'NoneType' object has no attribute 'firstChildElement'


If I try it with ASCII encodable chars it works correctly :

    >>> rawXmlToElement("<t>toto</t>").toXml()
   u'<t>toto</t>'

    >>> rawXmlToElement(u"<t>toto</t>").toXml()
   u'<t>toto</t>'

    >>> rawXmlToElement(unicode("<t>toto</t>", " utf-8")).toXml()
   u'<t>toto</t>'


Does anyone have an idea on what I'm doing wrong here? Thank you!

_______________________________________________
Twisted-Python mailing list
Twisted-Python@twistedmatrix.com
http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python

Reply via email to