On Thu, Mar 13, 2008 at 6:51 PM, Stanimir Stamenkov <[EMAIL PROTECTED]> wrote: > Wed, 12 Mar 2008 16:25:59 -0300, /Daniel Yokomizo/: > > > > The only issue I still have > > is getting the xml declaration info (e.g. version, encoding) but right > > now I can just ignore it. > > That info you should be able to obtain through the Locator2 [1] > interface. For example, in your ContentHandler implementation: > > Locator locator; > > public void setDocumentLocator(Locator locator) { > this.locator = locator; > } > > public void startDocument() { > if (locator instanceof Locator2) { > Locator2 loc = (Locator2) locator; > loc.getXMLVersion(); > loc.getEncoding(); > } > } > > [1] > http://xerces.apache.org/xerces2-j/javadocs/api/org/xml/sax/ext/Locator2.html > > -- > Stanimir
Thank you, that solved my problem. I got into some weird behavior, which I think it's a bug but I'm not certain. I created the InputSource using a Reader, didn't set the encoding property of the InputSource and tried to parse. Even if the document has a xml declaration with explicit encoding, the locator.getEncoding() returned null. Creating the InputSource with a InputStream worked, because the parser tried to discover the encoding based on the first bytes of the stream. I think this is a bug because the document has the encoding information and there are no other places with this information (either explicit, like in the InputSource, or implicit like in the InputStream case) that could possibly conflict, so the locator should have this info. Should I open a bug report (assuming that this isn't a known bug, I seached the JIRA but I couldn't find a thing)? Either way I changed my uses to InputStream and everything worked ok. Best regards, Daniel Yokomizo. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]