Perhaps a little risky, but what is preventing you from working directly on the raw XML? The Office file is simply a zipped archive containing various folders and files. If you know where the 'offending' markup is, what is to prevent you from unzipping the archive, manipulating the XML either directly or using a read/write parser and then zipping the archive back up again.
I am also compelled to ask, what happens if you use Office to convert one of the offending files? How does the markup this produces differ from the LibreOffice produces? -- View this message in context: http://apache-poi.1045710.n5.nabble.com/Remove-Invalid-XML-in-DOCX-tp5718602p5718619.html Sent from the POI - User mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
