Perhaps a little risky, but what is preventing you from working directly on
the raw XML? The Office file is simply a zipped archive containing various
folders and files. If you know where the 'offending' markup is, what is to
prevent you from unzipping the archive, manipulating the XML either directly
or using a read/write parser and then zipping the archive back up again.

I am also compelled to ask, what happens if you use Office to convert one of
the offending files? How does the markup this produces differ from the
LibreOffice produces?



--
View this message in context: 
http://apache-poi.1045710.n5.nabble.com/Remove-Invalid-XML-in-DOCX-tp5718602p5718619.html
Sent from the POI - User mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to