Hello all, re-replying to Jim's message.
On Wed, Feb 03, 2021 at 02:25:16PM -0500, Jim Jagielski wrote: > Funny that you bring this up... I'm been tracking down some bugs and they > all seem to be XML related... fastsax->libwriterfilter with occasional cores > due to __cxa_call_unexpected. > > I feel that making AOO more fragile by trying to work around cases where > invalid and/or non-compliant XML is encountered is just wrong. We should > either ignore the error (catch it) or raise an exception. Invalid data > shouldn't > be tolerated. Additionally, trying to be "lenient" is an easy vector for > vulnerabilities. For the record: the detection of duplicated attributes is made internally by the expat library. Our code just receives the error message and cannot do anything to recover it. I don't believe it's worth patching expat to allow duplicated attributes. I don't know the library well and I fear about the consequences of tinkering with it. But then my question becomes: do we want to offer any data recovery tools for corrupted documents? Like ``dumb'' XML parsers that just shave away XML errors? 1- it could be an external tool, written in a language that is easier to code into? (like Python, Perl, Java... whatever) 2- or an internal pre-parsing phase? It should not be based on the expat library though; do we have any other possibilities among the current modules? 3- or we leave it to hand-crafting by knowledgeable people on the forum, as it is happening now? I am looking forward to opinions ... and possibily reviews of PR 122 please ;-) Best regards, -- Arrigo http://rigo.altervista.org --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org For additional commands, e-mail: dev-h...@openoffice.apache.org