The SAXParseExceptions [1] reported to your ErrorHandler and thrown by the XMLReader contain location information. See getColumnNumber() and getLineNumber().
[1] http://xerces.apache.org/xerces2-j/javadocs/api/org/xml/sax/SAXParseException.html Michael Glavassevich XML Parser Development IBM Toronto Lab E-mail: [EMAIL PROTECTED] E-mail: [EMAIL PROTECTED] Julien Buchanan <[EMAIL PROTECTED]> wrote on 03/26/2007 05:17:46 PM: > hi. > i'd like to use SAXParserImpl.JAXPSAXParser from xerces2-j to parse and > correct > html/xhtml user input. > - is there a way to know at which line/column (or, since there's > probably no line > counter, at which character index) an error happened? > > i didn't see where to get that info in the api docs, and my attempt to > hijack the input > stream to get count position myself also failed since after the first > couple chars read individually, xerces reads 2045 at once (which makes > sense, performance-wise). > so how do i know at what char a parse error happens? > > also, as a side-question, not necessarily xerces-related, but you guys > might know and > spare me some wheel-reinventing: > i'm looking for a solution in java to sanitize untrusted user-input > html/xhtml of > not just malFORMED stuff (tagsoup or an own xerces-based parser solution > would do that) > but specifically MALICIOUS input, e.g. XSS attempts etc. for php, > there's htmlPurifier > ( http://hp.jpsband.org/ ), but for java, i found nothing equivalent. > Do you know of some existing solution for that? > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]