SAXParseException: not well-formed (invalid token)

2007-08-30 Thread Pablo Rey
Dear Colleagues,

I am getting the following error with a XML page:

>   File "/home/prey/RAL-CESGA/bin/voms2users/voms2users.py", line 69, in 
> getItems
> d = minidom.parseString(xml.read())
>   File "/usr/lib/python2.2/site-packages/_xmlplus/dom/minidom.py", line 967, 
> in parseString
> return _doparse(pulldom.parseString, args, kwargs)
>   File "/usr/lib/python2.2/site-packages/_xmlplus/dom/minidom.py", line 954, 
> in _doparse
> toktype, rootNode = events.getEvent()
>   File "/usr/lib/python2.2/site-packages/_xmlplus/dom/pulldom.py", line 265, 
> in getEvent
> self.parser.feed(buf)
>   File "/usr/lib/python2.2/site-packages/_xmlplus/sax/expatreader.py", line 
> 208, in feed
> self._err_handler.fatalError(exc)
>   File "/usr/lib/python2.2/site-packages/_xmlplus/sax/handler.py", line 38, 
> in fatalError
> raise exception
> xml.sax._exceptions.SAXParseException: :553:48: not well-formed 
> (invalid token)


> def getItems(page):
> opener =urllib.URLopener(key_file=HOSTKEY,cert_file=HOSTCERT) ;
> try:
>xml = opener.open(page)
> except:
>return []
> 
> d = minidom.parseString(xml.read())
> items = d.getElementsByTagName('item')
> data = []
> for i in items:
>data.append(getText(i.childNodes))
> 
> return data

The page is 
https://lcg-voms.cern.ch:8443/voms/cms/services/VOMSCompatibility?method=getGridmapUsers
 
and the line with the invalid character is (the invalid character is the 
final é of Université):

/C=BE/O=BEGRID/OU=Physique/OU=Univesité Catholique de 
Louvain/CN=Roberfroid


I have tried several options but I am not able to avoid this problem. 
Any idea?.

I am starting to work with Python so I am sorry if this problem is 
trivial.

Thanks for your time.
Pablo Rey
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: SAXParseException: not well-formed (invalid token)

2007-08-30 Thread Pablo Rey
Hi Stefan,

The xml has specified an encoding ().

About the possibility that you mention to recoding the input, could you 
let me know how to do it?. I am sorry I am starting with Python and I 
don't know how to do it.

Thanks by your help.
Pablo



On 30/08/2007 14:37, Stefan Behnel wrote:
> Pablo Rey wrote:
>> I am getting the following error with a XML page:
>>
>>>   File "/home/prey/RAL-CESGA/bin/voms2users/voms2users.py", line 69,
>>> in getItems
>>> d = minidom.parseString(xml.read())
>>>   File "/usr/lib/python2.2/site-packages/_xmlplus/dom/minidom.py",
>>> line 967, in parseString
>>> return _doparse(pulldom.parseString, args, kwargs)
>>>   File "/usr/lib/python2.2/site-packages/_xmlplus/dom/minidom.py",
>>> line 954, in _doparse
>>> toktype, rootNode = events.getEvent()
>>>   File "/usr/lib/python2.2/site-packages/_xmlplus/dom/pulldom.py",
>>> line 265, in getEvent
>>> self.parser.feed(buf)
>>>   File "/usr/lib/python2.2/site-packages/_xmlplus/sax/expatreader.py",
>>> line 208, in feed
>>> self._err_handler.fatalError(exc)
>>>   File "/usr/lib/python2.2/site-packages/_xmlplus/sax/handler.py",
>>> line 38, in fatalError
>>> raise exception
>>> xml.sax._exceptions.SAXParseException: :553:48: not
>>> well-formed (invalid token)
>>
>>> def getItems(page):
>>> opener =urllib.URLopener(key_file=HOSTKEY,cert_file=HOSTCERT) ;
>>> try:
>>>xml = opener.open(page)
>>> except:
>>>return []
>>>
>>> d = minidom.parseString(xml.read())
>>> items = d.getElementsByTagName('item')
>>> data = []
>>> for i in items:
>>>data.append(getText(i.childNodes))
>>>
>>> return data
>> The page is
>> https://lcg-voms.cern.ch:8443/voms/cms/services/VOMSCompatibility?method=getGridmapUsers
>> and the line with the invalid character is (the invalid character is the
>> final é of Université):
>>
>> /C=BE/O=BEGRID/OU=Physique/OU=Univesité Catholique de
>> Louvain/CN=Roberfroid
>>
>>
>> I have tried several options but I am not able to avoid this
>> problem. Any idea?.
> 
> Looks like the page is not well-formed XML (i.e. not XML at all). If it
> doesn't specify an encoding (), you can try recoding the
> input, possibly decoding it from latin-1 and re-encoding it as UTF-8 before
> passing it to the SAX parser.
> 
> Alternatively, tell the page authors to fix their page.
> 
> Stefan
-- 
http://mail.python.org/mailman/listinfo/python-list