>However I unearthed https://bz.apache.org/bugzilla/show_bug.cgi?id=58963 >and https://bz.apache.org/bugzilla/show_bug.cgi?id=57031 which I think were >the bugs for the change. Maybe they >contain the files you are looking for?
Thank you for the links, Dominik! I finally dug up what I was trying to earlier...this was the note. >2. An XML parsing related one: >Caused by: java.lang.ArrayIndexOutOfBoundsException: 8192 at com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.read(UTF8Reader.java:329) >This seems to be a bug in the JDK itself in relation to surrogate Unicode >characters, see e.g. >https://bugs.openjdk.java.net/browse/JDK-7156085 and originally >https://issues.apache.org/jira/browse/XERCESJ-1257 for more detailed >discussion. >Seems only JDK 9 has a fix for this :( >However it is very rare, only 6 times in 1 mio documents, so I think it >outweights the gain from using the JDK XML Parser.