I'm okay with a flag but I think we should be clear about where we think most of the work will be (until such time as someone actually does work in big-endian). Such as:
"Existing Arrow implementations are focused on exposing and operating on little-endian data and expect that format. The IPC/RPC metadata expresses this orientation as a property for future expansion. In the future, systems may generate or expect big-endian data and will need to set the endian orientation as such." On Sat, Apr 23, 2016 at 4:01 PM, Zheng, Kai <kai.zh...@intel.com> wrote: > > My assumption is that most deployments for the systems we are > > targeting are going to be homogenous in terms of byte ordering. I > > think this can allow initial implementations to ignore support for > > non-native byte ordering (i.e. raise an exception if detected). > > Has this been other's experience? > > The assumption sounds good in the big data domain where servers are very > likely to be homogenous in most cases (as far as I learned), though clients > may be a little complex. I guess the assumption will boost Arrow much > easier achieving much better performance. > > > I don't see a problem adding endianness as a flag in the IPC metadata, > and raise exceptions if big-endian data is ever encountered for the time > being. > > Yeah an endianness flag would be needed in IPC to let the other side to > know the endianness in the wire packets since there is a potential need to > tweak in some cases. > > Regards, > Kai > > -----Original Message----- > From: Wes McKinney [mailto:w...@cloudera.com] > Sent: Saturday, April 23, 2016 11:07 PM > To: dev@arrow.apache.org; Micah Kornfield <emkornfi...@gmail.com> > Subject: Re: Byte ordering/Endianness revisited > > I don't see a problem adding endianness as a flag in the IPC metadata, and > raise exceptions if big-endian data is ever encountered for the time being. > Since big-endian hardware is so exotic nowadays, I don't think it's > unreasonable to expect IBM or other hardware vendors requiring big-endian > support to contribute the byte-swapping logic when the time comes. I > suppose this just means we'll have to be careful in code reviews should any > algorithms get written that assume a particular endianness. Will defer to > others' judgment on this ultimately, though. > > On Fri, Apr 22, 2016 at 11:59 PM, Micah Kornfield <emkornfi...@gmail.com> > wrote: > > This was discussed on a previous thread > > (https://mail-archives.apache.org/mod_mbox/arrow-dev/201604.mbox/%3CCA > > Ka9qDkppFrJQCHsSN7CmkJCzOTAhGPERMd_u2CMZANNQGtNyw%40mail.gmail.com%3E > > the relevant snippet is pasted below). But I'd like to reopen this > > because it appears Spark supports big endian systems (high end IBM > > hardware). Right now the spec says: > > > > "The Arrow format is little endian." > > > > I'd like to change this to something like: > > > > "Algorithms written against Arrow Arrays should assume native > > byte-ordering. Endianness is communicated via IPC/RPC metadata and > > conversion to native byte-ordering is handled via IPC/RPC > > implementations". > > > > What do other people think? > > > > My assumption is that most deployments for the systems we are > > targeting are going to be homogenous in terms of byte ordering. I > > think this can allow initial implementations to ignore support for > > non-native byte ordering (i.e. raise an exception if detected). > > Has this been other's experience? > > > > Thanks, > > Micah > > > > Snippet from the original thread: > >>> > >>> 1. For completeness it might be useful to add a statement that the > >>> byte order (endianness) is platform native. > > > > > >> Actually, Arrow is little-endian. It is an oversight that we haven't > >>documented it as such. One of the key capabilities is to push it > >>across the wire between separate systems without serialization (not > >>just IPC). As such, we have to pick an endianness. If there is a huge > >>need for a second big-endian encoding, we'll need to extend the spec to > support that as a property. >