Hi Stefan, The xml has specified an encoding (<?xml version="1.0" encoding="UTF-8" ?>).
About the possibility that you mention to recoding the input, could you let me know how to do it?. I am sorry I am starting with Python and I don't know how to do it. Thanks by your help. Pablo On 30/08/2007 14:37, Stefan Behnel wrote: > Pablo Rey wrote: >> I am getting the following error with a XML page: >> >>> File "/home/prey/RAL-CESGA/bin/voms2users/voms2users.py", line 69, >>> in getItems >>> d = minidom.parseString(xml.read()) >>> File "/usr/lib/python2.2/site-packages/_xmlplus/dom/minidom.py", >>> line 967, in parseString >>> return _doparse(pulldom.parseString, args, kwargs) >>> File "/usr/lib/python2.2/site-packages/_xmlplus/dom/minidom.py", >>> line 954, in _doparse >>> toktype, rootNode = events.getEvent() >>> File "/usr/lib/python2.2/site-packages/_xmlplus/dom/pulldom.py", >>> line 265, in getEvent >>> self.parser.feed(buf) >>> File "/usr/lib/python2.2/site-packages/_xmlplus/sax/expatreader.py", >>> line 208, in feed >>> self._err_handler.fatalError(exc) >>> File "/usr/lib/python2.2/site-packages/_xmlplus/sax/handler.py", >>> line 38, in fatalError >>> raise exception >>> xml.sax._exceptions.SAXParseException: <unknown>:553:48: not >>> well-formed (invalid token) >> >>> def getItems(page): >>> opener =urllib.URLopener(key_file=HOSTKEY,cert_file=HOSTCERT) ; >>> try: >>> xml = opener.open(page) >>> except: >>> return [] >>> >>> d = minidom.parseString(xml.read()) >>> items = d.getElementsByTagName('item') >>> data = [] >>> for i in items: >>> data.append(getText(i.childNodes)) >>> >>> return data >> The page is >> https://lcg-voms.cern.ch:8443/voms/cms/services/VOMSCompatibility?method=getGridmapUsers >> and the line with the invalid character is (the invalid character is the >> final é of Université): >> >> <item>/C=BE/O=BEGRID/OU=Physique/OU=Univesité Catholique de >> Louvain/CN=Roberfroid</item> >> >> >> I have tried several options but I am not able to avoid this >> problem. Any idea?. > > Looks like the page is not well-formed XML (i.e. not XML at all). If it > doesn't specify an encoding (<?xml encoding="..."?>), you can try recoding the > input, possibly decoding it from latin-1 and re-encoding it as UTF-8 before > passing it to the SAX parser. > > Alternatively, tell the page authors to fix their page. > > Stefan -- http://mail.python.org/mailman/listinfo/python-list