sping <sebast...@pipping.org> added the comment:
Hi StyXman, I had a closer look at the files you shared, thanks for those, very helpful! What I found is that expat_test.py uses a single scalar variable (_DictSAXHandler.parser) to keep track of the related parser, while it would need a stack to allow recursion. In a way, the current approach is equivalent to walking up the stack as expected but never going back down. Once I make the code use a stack, the loop goes away. I'm pasting the patch inline (with two spaces indented globally) below. During debugging, these are commands I used to compare internal libexpat behavior, that may be of interest: EXPAT_ACCOUNTING_DEBUG=2 python expat_test.py |& sed 's,0x[0-9a-f]\+,XXX,' | tee pyexpat.txt EXPAT_ACCOUNTING_DEBUG=2 xmlwf -x test1.xml |& sed 's,0x[0-9a-f]\+,XXX,' | tee xmlwf.txt diff -u xmlwf.txt pyexpat.txt Here's how I quick-fixed expat_test.py to make things work: # diff -u expat_test.py_ORIG expat_test.py --- expat_test.py_ORIG 2022-01-26 21:15:27.506458671 +0100 +++ expat_test.py 2022-01-26 22:15:08.741384932 +0100 @@ -7,11 +7,21 @@ parser.ExternalEntityRefHandler = handler.externalEntityRef - # store the parser in the handler so we can recurse - handler.parser = parser - class _DictSAXHandler(object): + def __init__(self): + self._parsers = [] + + def push_parser(self, parser): + self._parsers.append(parser) + + def pop_parser(self): + self._parsers.pop() + + @property + def parser(self): + return self._parsers[-1] + def externalEntityRef(self, context, base, sysId, pubId): print(context, base, sysId, pubId) external_parser = self.parser.ExternalEntityParserCreate(context) @@ -19,7 +29,9 @@ setup_parser(external_parser, self) f = open(sysId, 'rb') print(f) + self.push_parser(external_parser) external_parser.ParseFile(f) + self.pop_parser() print(f) # all OK @@ -36,12 +48,13 @@ namespace_separator ) setup_parser(parser, handler) + handler.push_parser(parser) if hasattr(xml_input, 'read'): parser.ParseFile(xml_input) else: parser.Parse(xml_input, True) - return handler.item + # return handler.item # there is no .item parse(open('test1.xml', 'rb')) What do you tink? PS: Please note that processing external entities has security implications (see https://en.wikipedia.org/wiki/XML_external_entity_attack). Best, Sebastian ---------- nosy: +sping _______________________________________ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue38487> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com