This was discussed on a previous thread (https://mail-archives.apache.org/mod_mbox/arrow-dev/201604.mbox/%3CCAKa9qDkppFrJQCHsSN7CmkJCzOTAhGPERMd_u2CMZANNQGtNyw%40mail.gmail.com%3E the relevant snippet is pasted below). But I'd like to reopen this because it appears Spark supports big endian systems (high end IBM hardware). Right now the spec says:
"The Arrow format is little endian." I'd like to change this to something like: "Algorithms written against Arrow Arrays should assume native byte-ordering. Endianness is communicated via IPC/RPC metadata and conversion to native byte-ordering is handled via IPC/RPC implementations". What do other people think? My assumption is that most deployments for the systems we are targeting are going to be homogenous in terms of byte ordering. I think this can allow initial implementations to ignore support for non-native byte ordering (i.e. raise an exception if detected). Has this been other's experience? Thanks, Micah Snippet from the original thread: >> >> 1. For completeness it might be useful to add a statement that the >> byte order (endianness) is platform native. > Actually, Arrow is little-endian. It is an oversight that we haven't > documented it as >such. One of the key capabilities is to push it across the wire between >separate >systems without serialization (not just IPC). As such, we have to pick an >endianness. If there is a huge need for a second big-endian encoding, we'll >need to >extend the spec to support that as a property.