Hi Stefan,
The xml has specified an encoding ().
About the possibility that you mention to recoding the input, could you
let me know how to do it?. I am sorry I am starting with Python and I
don't know how to do it.
Thanks by your help.
Pablo
On 30/08/2007 14:37, Stefan Behnel wrote:
> Pablo Rey wrote:
>> I am getting the following error with a XML page:
>>
>>> File "/home/prey/RAL-CESGA/bin/voms2users/voms2users.py", line 69,
>>> in getItems
>>> d = minidom.parseString(xml.read())
>>> File "/usr/lib/python2.2/site-packages/_xmlplus/dom/minidom.py",
>>> line 967, in parseString
>>> return _doparse(pulldom.parseString, args, kwargs)
>>> File "/usr/lib/python2.2/site-packages/_xmlplus/dom/minidom.py",
>>> line 954, in _doparse
>>> toktype, rootNode = events.getEvent()
>>> File "/usr/lib/python2.2/site-packages/_xmlplus/dom/pulldom.py",
>>> line 265, in getEvent
>>> self.parser.feed(buf)
>>> File "/usr/lib/python2.2/site-packages/_xmlplus/sax/expatreader.py",
>>> line 208, in feed
>>> self._err_handler.fatalError(exc)
>>> File "/usr/lib/python2.2/site-packages/_xmlplus/sax/handler.py",
>>> line 38, in fatalError
>>> raise exception
>>> xml.sax._exceptions.SAXParseException: :553:48: not
>>> well-formed (invalid token)
>>
>>> def getItems(page):
>>> opener =urllib.URLopener(key_file=HOSTKEY,cert_file=HOSTCERT) ;
>>> try:
>>>xml = opener.open(page)
>>> except:
>>>return []
>>>
>>> d = minidom.parseString(xml.read())
>>> items = d.getElementsByTagName('item')
>>> data = []
>>> for i in items:
>>>data.append(getText(i.childNodes))
>>>
>>> return data
>> The page is
>> https://lcg-voms.cern.ch:8443/voms/cms/services/VOMSCompatibility?method=getGridmapUsers
>> and the line with the invalid character is (the invalid character is the
>> final é of Université):
>>
>> /C=BE/O=BEGRID/OU=Physique/OU=Univesité Catholique de
>> Louvain/CN=Roberfroid
>>
>>
>> I have tried several options but I am not able to avoid this
>> problem. Any idea?.
>
> Looks like the page is not well-formed XML (i.e. not XML at all). If it
> doesn't specify an encoding (), you can try recoding the
> input, possibly decoding it from latin-1 and re-encoding it as UTF-8 before
> passing it to the SAX parser.
>
> Alternatively, tell the page authors to fix their page.
>
> Stefan
--
http://mail.python.org/mailman/listinfo/python-list