Hi Elliotte,

I had a peek at your article and see in the code snippets that what you're
calling the "actual encoding" or "real encoding" actually isn't. The one
passed to startDocument() in XNI is the auto-detected encoding, the one
which Xerces guessed by peeking at the first few bytes in the document. The
actual encoding may not be known until the XML declaration has been read
and at this point it hasn't been read yet.

In SAX it's not legal to read from the Locator in startDocument() so any
calls to the Locator you make in that method may not be correct and
generally won't be with Xerces because at the point it calls
startDocument() it hasn't read enough of the document yet to be sure of
what the actual encoding is. If it looked like it was working you were
probably just getting lucky because the documents you tried were in UTF-8
or UTF-16. Specifically the Javadoc [1] says: "Note that the locator will
return correct information only during the invocation SAX event callbacks
after startDocument returns and before endDocument is called. The
application should not attempt to use it at any other time." So you have to
wait until an event following startDocument() before you can read the
encoding (or anything else) from the Locator.

Thanks.

[1]
http://xerces.apache.org/xerces2-j/javadocs/api/org/xml/sax/ContentHandler.html#setDocumentLocator(org.xml.sax.Locator)

Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: mrgla...@ca.ibm.com
E-mail: mrgla...@apache.org

Elliotte Harold <elh...@ibiblio.org> wrote on 04/24/2009 08:48:52 AM:

> Do you want the declared encoding or the real encoding? If the
> latter, see here:
>
> http://www.ibm.com/developerworks/library/x-tipsaxxni/
>
> --
> Elliotte Rusty Harold
> elh...@ibiblio.org
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: j-users-unsubscr...@xerces.apache.org
> For additional commands, e-mail: j-users-h...@xerces.apache.org

Reply via email to