Aloha, i'm trying to write an xml filter, that extracts some info about an .xml document (with external entities), esp. start elements and external entities. The document is a DOCBOOK xml and afacs well formed and passes our docbook toolchain (dblatex etc.).
My parser is (very simple): [115] scylla(scylla)> more pbxml.py class xmlhandle: def __init__(self): self.parser_stack = []; self.parser = None; def se(self,name,attr): print "s", self.parser.CurrentLineNumber, name, attr def ex(self,context,baseid,n1,n2): print "x",context,n1,n2 def fromxml(fname): import xml.parsers.expat p = xml.parsers.expat.ParserCreate() xl = xmlhandle() p.StartElementHandler = xl.se p.ExternalEntityRefHandler = xl.ex xl.parser = p p.ParseFile(file(fname)) return if __name__ == "__main__": import sys fromxml(sys.argv[1]) my document (in 2 parts): [116] scylla(scylla)> more s3.xml <?xml version="1.0"?> <!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN" "/usr/share/xml/docbook/xml/4.2/docbookx.dtd" [ <!ENTITY bookinfo SYSTEM "bookinfo.xml"> ]> <book> &bookinfo; <chapter id="technicalDescription"><title>technical description</title> <para> This chapter includes specification of the main simulation loop. </para> </chapter> </book> [118] scylla(scylla)> more bookinfo.xml <bookinfo> <title>BookTitle</title> <authorgroup> <author> <firstname>A</firstname> <surname>B</surname> </author> </authorgroup> </bookinfo> The run produces: [120] scylla(scylla)> python pbxml.py s3.xml s 7 book {} x bookinfo bookinfo.xml None s 9 chapter {u'id': u'technicalDescription'} s 9 title {} s 10 para {} Traceback (most recent call last): File "pbxml.py", line 25, in ? fromxml(sys.argv[1]) File "pbxml.py", line 20, in fromxml p.ParseFile(file(fname)) TypeError: an integer is required Anyone any idea where the error is produced? Anyone any idea how to debug(? if it's really a bug or missunderstanding of expate) this? Hoping for an answer and wishing a happy day, LOBI -- http://mail.python.org/mailman/listinfo/python-list