I'm okay with a flag but I think we should be clear about where we think
most of the work will be (until such time as someone actually does work in
big-endian).  Such as:

"Existing Arrow implementations are focused on exposing and operating on
little-endian data and expect that format. The IPC/RPC metadata expresses
this orientation as a property for future expansion. In the future, systems
may generate or expect big-endian data and will need to set the endian
orientation as such."





On Sat, Apr 23, 2016 at 4:01 PM, Zheng, Kai <kai.zh...@intel.com> wrote:

> > My assumption is that most deployments for the systems we are
> > targeting  are going to be homogenous in terms of byte ordering.  I
> > think this can allow initial implementations to ignore support for
> > non-native byte ordering (i.e. raise an exception if detected).
> > Has this been other's experience?
>
> The assumption sounds good in the big data domain where servers are very
> likely to be homogenous in most cases (as far as I learned), though clients
> may be a little complex. I guess the assumption will boost Arrow much
> easier achieving much better performance.
>
> > I don't see a problem adding endianness as a flag in the IPC metadata,
> and raise exceptions if big-endian data is ever encountered for the time
> being.
>
> Yeah an endianness flag would be needed in IPC to let the other side to
> know the endianness in the wire packets since there is a potential need to
> tweak in some cases.
>
> Regards,
> Kai
>
> -----Original Message-----
> From: Wes McKinney [mailto:w...@cloudera.com]
> Sent: Saturday, April 23, 2016 11:07 PM
> To: dev@arrow.apache.org; Micah Kornfield <emkornfi...@gmail.com>
> Subject: Re: Byte ordering/Endianness revisited
>
> I don't see a problem adding endianness as a flag in the IPC metadata, and
> raise exceptions if big-endian data is ever encountered for the time being.
> Since big-endian hardware is so exotic nowadays, I don't think it's
> unreasonable to expect IBM or other hardware vendors requiring big-endian
> support to contribute the byte-swapping logic when the time comes. I
> suppose this just means we'll have to be careful in code reviews should any
> algorithms get written that assume a particular endianness. Will defer to
> others' judgment on this ultimately, though.
>
> On Fri, Apr 22, 2016 at 11:59 PM, Micah Kornfield <emkornfi...@gmail.com>
> wrote:
> > This was discussed on a previous thread
> > (https://mail-archives.apache.org/mod_mbox/arrow-dev/201604.mbox/%3CCA
> > Ka9qDkppFrJQCHsSN7CmkJCzOTAhGPERMd_u2CMZANNQGtNyw%40mail.gmail.com%3E
> > the relevant snippet is pasted below).  But I'd like to reopen this
> > because it appears Spark supports big endian systems (high end IBM
> > hardware).    Right now the spec says:
> >
> > "The Arrow format is little endian."
> >
> > I'd like to change this to something like:
> >
> > "Algorithms written against Arrow Arrays should assume native
> > byte-ordering. Endianness is communicated via IPC/RPC metadata and
> > conversion to native byte-ordering is handled via IPC/RPC
> > implementations".
> >
> > What do other people think?
> >
> > My assumption is that most deployments for the systems we are
> > targeting  are going to be homogenous in terms of byte ordering.  I
> > think this can allow initial implementations to ignore support for
> > non-native byte ordering (i.e. raise an exception if detected).
> > Has this been other's experience?
> >
> > Thanks,
> > Micah
> >
> > Snippet from the original thread:
> >>>
> >>> 1.  For completeness it might be useful to add a statement that the
> >>> byte order (endianness) is platform native.
> >
> >
> >> Actually, Arrow is little-endian. It is an oversight that we haven't
> >>documented it as such. One of the key capabilities is to push it
> >>across the wire between separate systems without serialization (not
> >>just IPC). As such, we have to pick an endianness. If there is a huge
> >>need for a second big-endian encoding, we'll need to extend the spec to
> support that as a property.
>

Reply via email to