On 10/12/2012 3:19 AM, Alan Bateman wrote:
...
The next step in this effort is dealing with the issue of arbitrary
encodings. The storeToXML method allows the encoding to be specified,
the loadFromXML method assumes that the implementation can decode the
stream and read the encoding declaration. The specification doesn't
make it clear how either method behaves with unrecognized encodings
and this is something that we need to fix in order to allow for
alternative implementations, in particular tiny parsers that might not
support more than a few.
The webrev the proposed changes is here:
http://cr.openjdk.java.net/~alanb/8000685/webrev/
This looks good to me.
The proposal is that an implementation minimally supports UTF-8 and
UTF-16, which I think is consistent with the W3C XML specification [2].
Based on a search of a large number of projects then it appears that
these methods aren't used very much so I don't think this will have
any significant impact. In addition the same set of encodings [ which
is not exactly the same set as Charsets.availableCharsets().keySet() ]
that works today will continue to work when the service provider that
uses JAXP is installed.
In addition, to specifying the required encodings, I have also changed
both methods to specify that UnsupportedEncodingException may be
thrown. In the case of loadFromXML then this is the long standing
behavior anyway. In the case of storeToXML then the long standing
behavior is somewhat bizarre. If the method is invoked with an
unsupported encoding then the underlying Xalan code prints a warning
to System.out and changes the encoding under the covers to UTF-8. I've
submitted a bug on this oddity; in the mean-time I've added a check in
the platform provider to always fail for charsets that aren't recognized.
Good find.
Mandy