> My assumption is that most deployments for the systems we are > targeting are going to be homogenous in terms of byte ordering. I > think this can allow initial implementations to ignore support for > non-native byte ordering (i.e. raise an exception if detected). > Has this been other's experience?
The assumption sounds good in the big data domain where servers are very likely to be homogenous in most cases (as far as I learned), though clients may be a little complex. I guess the assumption will boost Arrow much easier achieving much better performance. > I don't see a problem adding endianness as a flag in the IPC metadata, and > raise exceptions if big-endian data is ever encountered for the time being. Yeah an endianness flag would be needed in IPC to let the other side to know the endianness in the wire packets since there is a potential need to tweak in some cases. Regards, Kai -----Original Message----- From: Wes McKinney [mailto:w...@cloudera.com] Sent: Saturday, April 23, 2016 11:07 PM To: dev@arrow.apache.org; Micah Kornfield <emkornfi...@gmail.com> Subject: Re: Byte ordering/Endianness revisited I don't see a problem adding endianness as a flag in the IPC metadata, and raise exceptions if big-endian data is ever encountered for the time being. Since big-endian hardware is so exotic nowadays, I don't think it's unreasonable to expect IBM or other hardware vendors requiring big-endian support to contribute the byte-swapping logic when the time comes. I suppose this just means we'll have to be careful in code reviews should any algorithms get written that assume a particular endianness. Will defer to others' judgment on this ultimately, though. On Fri, Apr 22, 2016 at 11:59 PM, Micah Kornfield <emkornfi...@gmail.com> wrote: > This was discussed on a previous thread > (https://mail-archives.apache.org/mod_mbox/arrow-dev/201604.mbox/%3CCA > Ka9qDkppFrJQCHsSN7CmkJCzOTAhGPERMd_u2CMZANNQGtNyw%40mail.gmail.com%3E > the relevant snippet is pasted below). But I'd like to reopen this > because it appears Spark supports big endian systems (high end IBM > hardware). Right now the spec says: > > "The Arrow format is little endian." > > I'd like to change this to something like: > > "Algorithms written against Arrow Arrays should assume native > byte-ordering. Endianness is communicated via IPC/RPC metadata and > conversion to native byte-ordering is handled via IPC/RPC > implementations". > > What do other people think? > > My assumption is that most deployments for the systems we are > targeting are going to be homogenous in terms of byte ordering. I > think this can allow initial implementations to ignore support for > non-native byte ordering (i.e. raise an exception if detected). > Has this been other's experience? > > Thanks, > Micah > > Snippet from the original thread: >>> >>> 1. For completeness it might be useful to add a statement that the >>> byte order (endianness) is platform native. > > >> Actually, Arrow is little-endian. It is an oversight that we haven't >>documented it as such. One of the key capabilities is to push it >>across the wire between separate systems without serialization (not >>just IPC). As such, we have to pick an endianness. If there is a huge >>need for a second big-endian encoding, we'll need to extend the spec to >>support that as a property.