I didn't make the XML file. And I don't like messing with other peoples data. So I just want my SAX parser to ignore it. I can't help if other people make it hard for me to read their xml file...
On Sat, 23 Apr 2005 13:48:49 -0600, Uche Ogbuji wrote: > On Sat, 2005-04-23 at 15:20 +0200, Willem Ligtenberg wrote: >> I decided to use SAX to parse my xml file. >> But the parser crashes on: >> File "/usr/lib/python2.3/site-packages/_xmlplus/sax/handler.py", line 38, >> in fatalError >> raise exception >> xml.sax._exceptions.SAXParseException: NCBI_Entrezgene.dtd:8:0: error in >> processing external entity reference >> >> This is caused by: >> <!DOCTYPE Entrezgene-Set PUBLIC "-//NCBI//NCBI Entrezgene/EN" >> "NCBI_Entrezgene.dtd"> >> >> If I remove it, it parses normally. >> I've created my parser like this: >> import sys >> from xml.sax import make_parser >> from handler import EntrezGeneHandler >> >> fopen = open("mouse2.xml", "r") >> ch = EntrezGeneHandler() >> saxparser = make_parser() >> saxparser.setContentHandler(ch) >> saxparser.parse(fopen) >> >> And the handler is: >> from xml.sax import ContentHandler >> >> class EntrezGeneHandler(ContentHandler): >> """ >> A handler to deal with EntrezGene in XML >> """ >> >> def startElement(self, name, attrs): >> print "Start element:", name >> >> So it doesn't do much yet. And still it crashes... >> How can I tell the parser not to look at the DOCTYPE declaration. >> On a website: >> http://www.devarticles.com/c/a/XML/Parsing-XML-with-SAX-and-Python/1/ >> it states that the SAX parsers are not validating, so this error shouldn't >> even occur? > > Just because it's not validating doesn't mean that the parser won't try > to read the external entity. > > Maybe you're looking for > > """ > feature_external_ges > Value: "http://xml.org/sax/features/external-general-entities" > true: Include all external general (text) entities. > false: Do not include external general entities. > access: (parsing) read-only; (not parsing) read/write > """ > > Quote from: > > http://docs.python.org/lib/module-xml.sax.handler.html > > But you're on pretty shaky ground in any XML 1.x toolkit using a bogus > DTDecl in this way. Why go through the hassle? Why not use a catalog, > or remove the DTDecl? -- http://mail.python.org/mailman/listinfo/python-list