Thanks to all those who replied! Your replies directed me to simply use another library to parse HTML, some of which work in a way that can be interfaced with SAX (though not perfectly), as I was hoping. I have succeeded using Nu htmlparser library.
Le jeudi 29 mai 2025 à 19:02 +0300, Stanimir Stamenkov a écrit : Tue, 27 May 2025 17:08:55 +0000, /Olivier Cailloux/: Can anyone point me towards some way of reading HTML (non XML) files using Xerces-J? I tried various things using org.apache.xerces.parsers.DOMParserImpl but parsing this file for example (valid according to Nu validator) fails.