I don't see a problem adding endianness as a flag in the IPC metadata, and raise exceptions if big-endian data is ever encountered for the time being. Since big-endian hardware is so exotic nowadays, I don't think it's unreasonable to expect IBM or other hardware vendors requiring big-endian support to contribute the byte-swapping logic when the time comes. I suppose this just means we'll have to be careful in code reviews should any algorithms get written that assume a particular endianness. Will defer to others' judgment on this ultimately, though.
On Fri, Apr 22, 2016 at 11:59 PM, Micah Kornfield <emkornfi...@gmail.com> wrote: > This was discussed on a previous thread > (https://mail-archives.apache.org/mod_mbox/arrow-dev/201604.mbox/%3CCAKa9qDkppFrJQCHsSN7CmkJCzOTAhGPERMd_u2CMZANNQGtNyw%40mail.gmail.com%3E > the relevant snippet is pasted below). But I'd like to reopen this > because it appears Spark supports big endian systems (high end IBM > hardware). Right now the spec says: > > "The Arrow format is little endian." > > I'd like to change this to something like: > > "Algorithms written against Arrow Arrays should assume native > byte-ordering. Endianness is communicated via IPC/RPC metadata and > conversion to native byte-ordering is handled via IPC/RPC > implementations". > > What do other people think? > > My assumption is that most deployments for the systems we are > targeting are going to be homogenous in terms of byte ordering. I > think this can allow initial implementations to ignore support for > non-native byte ordering (i.e. raise an exception if detected). > Has this been other's experience? > > Thanks, > Micah > > Snippet from the original thread: >>> >>> 1. For completeness it might be useful to add a statement that the >>> byte order (endianness) is platform native. > > >> Actually, Arrow is little-endian. It is an oversight that we haven't >> documented it as >>such. One of the key capabilities is to push it across the wire between >>separate >>systems without serialization (not just IPC). As such, we have to pick an >>endianness. If there is a huge need for a second big-endian encoding, we'll >>need to >>extend the spec to support that as a property.