I suggest looking at Rome's [1] XMLReader [2][3][4][5]. For instance, it can
be used like this...
InputSource inputSource = new InputSource(url.toExternalForm());
try {
XmlReader reader = new XmlReader(url);
inputSource.setCharacterStream(reader);
inputSource.setEncoding(reader.ge
Hi Michael,
I confirm, in my case I am working with ANSI documents and the encoding
returned in the startDocument() method would consistently return an
"UTF-8" encoding, which is wrong.
So the best bet is to read the prolog, or otherwise to rely on the
parser's guessing...
BR,
Olivier DUR
Hi Elliotte,
I had a peek at your article and see in the code snippets that what you're
calling the "actual encoding" or "real encoding" actually isn't. The one
passed to startDocument() in XNI is the auto-detected encoding, the one
which Xerces guessed by peeking at the first few bytes in the do
Hi Elliotte,
Well spotted, actually I was after the real encoding! However, as you
mentioned this approach is only reliable 90% of the time, so I might end
up using the declared encoding instead. From what I understand, I will
get this information on the startElement() invocation.
To summar
To: j-users@xerces.apache.org
Subject: Re: Accessing xml prolog via SAX
Do you want the declared encoding or the real encoding? If the latter,
see here:
http://www.ibm.com/developerworks/library/x-tipsaxxni/
--
Elliotte Rusty Harold
elh...@ibiblio.org
...@ibiblio.org]
Sent: 24 April 2009 13:49
To: j-users@xerces.apache.org
Subject: Re: Accessing xml prolog via SAX
Do you want the declared encoding or the real encoding? If the latter,
see here:
http://www.ibm.com/developerworks/library/x-tipsaxxni/
--
Elliotte Rusty Harold
elh...@ibiblio.org
Do you want the declared encoding or the real encoding? If the latter, see here:
http://www.ibm.com/developerworks/library/x-tipsaxxni/
--
Elliotte Rusty Harold
elh...@ibiblio.org
-
To unsubscribe, e-mail: j-users-unsubscr...@x
Hi!
I am trying to use Xerces to read the XML prolog in order to get the
file encoding/version.
Unfortunately, I could not get it to work (the getEncoding() method
returns ""). So I guess I must have missed something out despite the
fact that I have read the FAQ, the API, as well as previous