Pablo Rey wrote: > I am getting the following error with a XML page: > >> File "/home/prey/RAL-CESGA/bin/voms2users/voms2users.py", line 69, >> in getItems >> d = minidom.parseString(xml.read()) >> File "/usr/lib/python2.2/site-packages/_xmlplus/dom/minidom.py", >> line 967, in parseString >> return _doparse(pulldom.parseString, args, kwargs) >> File "/usr/lib/python2.2/site-packages/_xmlplus/dom/minidom.py", >> line 954, in _doparse >> toktype, rootNode = events.getEvent() >> File "/usr/lib/python2.2/site-packages/_xmlplus/dom/pulldom.py", >> line 265, in getEvent >> self.parser.feed(buf) >> File "/usr/lib/python2.2/site-packages/_xmlplus/sax/expatreader.py", >> line 208, in feed >> self._err_handler.fatalError(exc) >> File "/usr/lib/python2.2/site-packages/_xmlplus/sax/handler.py", >> line 38, in fatalError >> raise exception >> xml.sax._exceptions.SAXParseException: <unknown>:553:48: not >> well-formed (invalid token) > > >> def getItems(page): >> opener =urllib.URLopener(key_file=HOSTKEY,cert_file=HOSTCERT) ; >> try: >> xml = opener.open(page) >> except: >> return [] >> >> d = minidom.parseString(xml.read()) >> items = d.getElementsByTagName('item') >> data = [] >> for i in items: >> data.append(getText(i.childNodes)) >> >> return data > > The page is > https://lcg-voms.cern.ch:8443/voms/cms/services/VOMSCompatibility?method=getGridmapUsers > and the line with the invalid character is (the invalid character is the > final é of Université): > > <item>/C=BE/O=BEGRID/OU=Physique/OU=Univesité Catholique de > Louvain/CN=Roberfroid</item> > > > I have tried several options but I am not able to avoid this > problem. Any idea?.
Looks like the page is not well-formed XML (i.e. not XML at all). If it doesn't specify an encoding (<?xml encoding="..."?>), you can try recoding the input, possibly decoding it from latin-1 and re-encoding it as UTF-8 before passing it to the SAX parser. Alternatively, tell the page authors to fix their page. Stefan -- http://mail.python.org/mailman/listinfo/python-list