On Sat, 2005-04-23 at 15:20 +0200, Willem Ligtenberg wrote: > I decided to use SAX to parse my xml file. > But the parser crashes on: > File "/usr/lib/python2.3/site-packages/_xmlplus/sax/handler.py", line 38, > in fatalError > raise exception > xml.sax._exceptions.SAXParseException: NCBI_Entrezgene.dtd:8:0: error in > processing external entity reference > > This is caused by: > <!DOCTYPE Entrezgene-Set PUBLIC "-//NCBI//NCBI Entrezgene/EN" > "NCBI_Entrezgene.dtd"> > > If I remove it, it parses normally. > I've created my parser like this: > import sys > from xml.sax import make_parser > from handler import EntrezGeneHandler > > fopen = open("mouse2.xml", "r") > ch = EntrezGeneHandler() > saxparser = make_parser() > saxparser.setContentHandler(ch) > saxparser.parse(fopen) > > And the handler is: > from xml.sax import ContentHandler > > class EntrezGeneHandler(ContentHandler): > """ > A handler to deal with EntrezGene in XML > """ > > def startElement(self, name, attrs): > print "Start element:", name > > So it doesn't do much yet. And still it crashes... > How can I tell the parser not to look at the DOCTYPE declaration. > On a website: > http://www.devarticles.com/c/a/XML/Parsing-XML-with-SAX-and-Python/1/ > it states that the SAX parsers are not validating, so this error shouldn't > even occur?
Just because it's not validating doesn't mean that the parser won't try to read the external entity. Maybe you're looking for """ feature_external_ges Value: "http://xml.org/sax/features/external-general-entities" true: Include all external general (text) entities. false: Do not include external general entities. access: (parsing) read-only; (not parsing) read/write """ Quote from: http://docs.python.org/lib/module-xml.sax.handler.html But you're on pretty shaky ground in any XML 1.x toolkit using a bogus DTDecl in this way. Why go through the hassle? Why not use a catalog, or remove the DTDecl? -- Uche Ogbuji Fourthought, Inc. http://uche.ogbuji.net http://fourthought.com http://copia.ogbuji.net http://4Suite.org Use CSS to display XML, part 2 - http://www-128.ibm.com/developerworks/edu/x-dw-x-xmlcss2-i.html XML Output with 4Suite & AMara - http://www.xml.com/pub/a/2005/04/20/py-xml.html Use XSLT to prepare XML for import into OpenOffice Calc - http://www.ibm.com/developerworks/xml/library/x-oocalc/ Schema standardization for top-down semantic transparency - http://www-128.ibm.com/developerworks/xml/library/x-think31.html -- http://mail.python.org/mailman/listinfo/python-list