Thank you for all the info Michael, very helpful.

Gary

On Mon, Aug 13, 2012 at 4:51 PM, Michael Glavassevich
<mrgla...@ca.ibm.com>wrote:

> Hi Gary,
>
> Gary Gregory <garydgreg...@gmail.com> wrote on 13/08/2012 02:27:33 PM:
>
>
> > Hi Michael,
> >
> > I’ve not caught one in the savannah either! I've not had a customer
> > request for it either, that, or the request did not make it through
> > our sales engineers, professional services, or tech support all the way
> to me.
> >
> > Our products are XML and buzzword compliant and I am checking my Ps
> > and Qs. So, at this point, the point is rather academic as you mention.
>
> XML parsers are only required to support UTF-8 and UTF-16. Support for any
> other encodings is icing on the cake.
>
> > I am aware of the inefficiencies involved, but our customers can
> > decide how efficient they want to be for themselves, sometimes they
> > have no control over the format of the documents they have to
> > process with our software. For those who can control the format, I
> > do not know if someone has tried UTF-32, watched it blow up and then
> > switched to something.
> >
> > Now, out of curiosity, I do notice a
> > org.apache.xerces.impl.io.UCSReader class in Xerces which is used
> > from a couple of places.
> >
> > Is that not hooked up in all the right spots?
>
> It is, but if presented with a UTF-32 BOM, Xerces won't hit the code path
> where the UCSReader would be used since its encoding auto-detector doesn't
> recognize UTF-32 BOM byte sequences. It's probably just defaulting to UTF-8
> (since it has no better guess) and then bombs out.
>
> Assuming Xerces did support UTF-32 the UCSReader might not be the right
> reader to use anyway. A compliant UTF-32 Reader might require more error
> checking (e.g. to reject non-characters, like the byte sequences that would
> be used to represent surrogates in UTF-16).
>
> > Gary
>
> > On Mon, Aug 13, 2012 at 2:07 PM, Michael Glavassevich <
> mrgla...@ca.ibm.com
> > > wrote:
> > Hi Gary,
> >
> > There haven't been any plans for UTF-32 support. It seems you're the
> > first [1] (and only) one who has asked about it on the project lists.
> >
> > Is this just an academic question or do you have an actual need for it?
> >
> > I must say I've never seen a UTF-32 encoded document in the wild. In
> > my opinion it's a very inefficient encoding. Always uses 32-bits to
> > represent a character when the largest Unicode code point only
> > requires 21-bits. UTF-8 and UTF-16 only ever use that much space for
> > supplementary characters (i.e. code points greater than U+FFFF).
> >
> > Thanks.
> >
> > [1] http://xerces-j.markmail.org/search/?q=UTF-32
> >
> > Michael Glavassevich
> > XML Technologies and WAS Development
> > IBM Toronto Lab
> > E-mail: mrgla...@ca.ibm.com
> > E-mail: mrgla...@apache.org
> >
> > Gary Gregory <garydgreg...@gmail.com> wrote on 13/08/2012 01:49:46 PM:
> >
> >
> > > Hi All:
> > >
> > > Any plans to support UTF-32 BOM?
> > >
> > > Currently, if I parse a UTF-32 document I get 'content not expected
> > > in prolog" error.
> > >
> > > Thank you,
> > > Gary
> > >
> > > --
> > > E-Mail: garydgreg...@gmail.com | ggreg...@apache.org
> > > JUnit in Action, 2nd Ed:
> > http://bit.ly/ECvg0
> >
> > > Spring Batch in Action: http://bit.ly/bqpbCK
> > > Blog: http://garygregory.wordpress.com
> > > Home: http://garygregory.com/
> > > Tweet! http://twitter.com/GaryGregory
> >
> >
> >
> > --
> > E-Mail: garydgreg...@gmail.com | ggreg...@apache.org
> > JUnit in Action, 2nd Ed: http://bit.ly/ECvg0
> > Spring Batch in Action: http://bit.ly/bqpbCK
> > Blog: http://garygregory.wordpress.com
> > Home: http://garygregory.com/
> > Tweet! http://twitter.com/GaryGregory
>
> Michael Glavassevich
> XML Technologies and WAS Development
> IBM Toronto Lab
> E-mail: mrgla...@ca.ibm.com
> E-mail: mrgla...@apache.org
>



-- 
E-Mail: garydgreg...@gmail.com | ggreg...@apache.org
JUnit in Action, 2nd Ed: <http://goog_1249600977>http://bit.ly/ECvg0
Spring Batch in Action: <http://s.apache.org/HOq>http://bit.ly/bqpbCK
Blog: http://garygregory.wordpress.com
Home: http://garygregory.com/
Tweet! http://twitter.com/GaryGregory

Reply via email to