Thank you for all the info Michael, very helpful. Gary
On Mon, Aug 13, 2012 at 4:51 PM, Michael Glavassevich <mrgla...@ca.ibm.com>wrote: > Hi Gary, > > Gary Gregory <garydgreg...@gmail.com> wrote on 13/08/2012 02:27:33 PM: > > > > Hi Michael, > > > > I’ve not caught one in the savannah either! I've not had a customer > > request for it either, that, or the request did not make it through > > our sales engineers, professional services, or tech support all the way > to me. > > > > Our products are XML and buzzword compliant and I am checking my Ps > > and Qs. So, at this point, the point is rather academic as you mention. > > XML parsers are only required to support UTF-8 and UTF-16. Support for any > other encodings is icing on the cake. > > > I am aware of the inefficiencies involved, but our customers can > > decide how efficient they want to be for themselves, sometimes they > > have no control over the format of the documents they have to > > process with our software. For those who can control the format, I > > do not know if someone has tried UTF-32, watched it blow up and then > > switched to something. > > > > Now, out of curiosity, I do notice a > > org.apache.xerces.impl.io.UCSReader class in Xerces which is used > > from a couple of places. > > > > Is that not hooked up in all the right spots? > > It is, but if presented with a UTF-32 BOM, Xerces won't hit the code path > where the UCSReader would be used since its encoding auto-detector doesn't > recognize UTF-32 BOM byte sequences. It's probably just defaulting to UTF-8 > (since it has no better guess) and then bombs out. > > Assuming Xerces did support UTF-32 the UCSReader might not be the right > reader to use anyway. A compliant UTF-32 Reader might require more error > checking (e.g. to reject non-characters, like the byte sequences that would > be used to represent surrogates in UTF-16). > > > Gary > > > On Mon, Aug 13, 2012 at 2:07 PM, Michael Glavassevich < > mrgla...@ca.ibm.com > > > wrote: > > Hi Gary, > > > > There haven't been any plans for UTF-32 support. It seems you're the > > first [1] (and only) one who has asked about it on the project lists. > > > > Is this just an academic question or do you have an actual need for it? > > > > I must say I've never seen a UTF-32 encoded document in the wild. In > > my opinion it's a very inefficient encoding. Always uses 32-bits to > > represent a character when the largest Unicode code point only > > requires 21-bits. UTF-8 and UTF-16 only ever use that much space for > > supplementary characters (i.e. code points greater than U+FFFF). > > > > Thanks. > > > > [1] http://xerces-j.markmail.org/search/?q=UTF-32 > > > > Michael Glavassevich > > XML Technologies and WAS Development > > IBM Toronto Lab > > E-mail: mrgla...@ca.ibm.com > > E-mail: mrgla...@apache.org > > > > Gary Gregory <garydgreg...@gmail.com> wrote on 13/08/2012 01:49:46 PM: > > > > > > > Hi All: > > > > > > Any plans to support UTF-32 BOM? > > > > > > Currently, if I parse a UTF-32 document I get 'content not expected > > > in prolog" error. > > > > > > Thank you, > > > Gary > > > > > > -- > > > E-Mail: garydgreg...@gmail.com | ggreg...@apache.org > > > JUnit in Action, 2nd Ed: > > http://bit.ly/ECvg0 > > > > > Spring Batch in Action: http://bit.ly/bqpbCK > > > Blog: http://garygregory.wordpress.com > > > Home: http://garygregory.com/ > > > Tweet! http://twitter.com/GaryGregory > > > > > > > > -- > > E-Mail: garydgreg...@gmail.com | ggreg...@apache.org > > JUnit in Action, 2nd Ed: http://bit.ly/ECvg0 > > Spring Batch in Action: http://bit.ly/bqpbCK > > Blog: http://garygregory.wordpress.com > > Home: http://garygregory.com/ > > Tweet! http://twitter.com/GaryGregory > > Michael Glavassevich > XML Technologies and WAS Development > IBM Toronto Lab > E-mail: mrgla...@ca.ibm.com > E-mail: mrgla...@apache.org > -- E-Mail: garydgreg...@gmail.com | ggreg...@apache.org JUnit in Action, 2nd Ed: <http://goog_1249600977>http://bit.ly/ECvg0 Spring Batch in Action: <http://s.apache.org/HOq>http://bit.ly/bqpbCK Blog: http://garygregory.wordpress.com Home: http://garygregory.com/ Tweet! http://twitter.com/GaryGregory