RE: Byte ordering/Endianness revisited

Zheng, Kai Sat, 23 Apr 2016 16:02:46 -0700

> My assumption is that most deployments for the systems we are 
> targeting  are going to be homogenous in terms of byte ordering.  I 
> think this can allow initial implementations to ignore support for 
> non-native byte ordering (i.e. raise an exception if detected).
> Has this been other's experience?

The assumption sounds good in the big data domain where servers are very likely 
to be homogenous in most cases (as far as I learned), though clients may be a 
little complex. I guess the assumption will boost Arrow much easier achieving 
much better performance.

> I don't see a problem adding endianness as a flag in the IPC metadata, and 
> raise exceptions if big-endian data is ever encountered for the time being.

Yeah an endianness flag would be needed in IPC to let the other side to know 
the endianness in the wire packets since there is a potential need to tweak in 
some cases.

Regards,
Kai

-----Original Message-----
From: Wes McKinney [mailto:w...@cloudera.com] 
Sent: Saturday, April 23, 2016 11:07 PM
To: dev@arrow.apache.org; Micah Kornfield <emkornfi...@gmail.com>
Subject: Re: Byte ordering/Endianness revisited

I don't see a problem adding endianness as a flag in the IPC metadata, and 
raise exceptions if big-endian data is ever encountered for the time being. 
Since big-endian hardware is so exotic nowadays, I don't think it's 
unreasonable to expect IBM or other hardware vendors requiring big-endian 
support to contribute the byte-swapping logic when the time comes. I suppose 
this just means we'll have to be careful in code reviews should any algorithms 
get written that assume a particular endianness. Will defer to others' judgment 
on this ultimately, though.

On Fri, Apr 22, 2016 at 11:59 PM, Micah Kornfield <emkornfi...@gmail.com> wrote:
> This was discussed on a previous thread 
> (https://mail-archives.apache.org/mod_mbox/arrow-dev/201604.mbox/%3CCA
> Ka9qDkppFrJQCHsSN7CmkJCzOTAhGPERMd_u2CMZANNQGtNyw%40mail.gmail.com%3E
> the relevant snippet is pasted below).  But I'd like to reopen this 
> because it appears Spark supports big endian systems (high end IBM
> hardware).    Right now the spec says:
>
> "The Arrow format is little endian."
>
> I'd like to change this to something like:
>
> "Algorithms written against Arrow Arrays should assume native 
> byte-ordering. Endianness is communicated via IPC/RPC metadata and 
> conversion to native byte-ordering is handled via IPC/RPC 
> implementations".
>
> What do other people think?
>
> My assumption is that most deployments for the systems we are 
> targeting  are going to be homogenous in terms of byte ordering.  I 
> think this can allow initial implementations to ignore support for 
> non-native byte ordering (i.e. raise an exception if detected).
> Has this been other's experience?
>
> Thanks,
> Micah
>
> Snippet from the original thread:
>>>
>>> 1.  For completeness it might be useful to add a statement that the 
>>> byte order (endianness) is platform native.
>
>
>> Actually, Arrow is little-endian. It is an oversight that we haven't 
>>documented it as such. One of the key capabilities is to push it 
>>across the wire between separate systems without serialization (not 
>>just IPC). As such, we have to pick an endianness. If there is a huge 
>>need for a second big-endian encoding, we'll need to extend the spec to 
>>support that as a property.

RE: Byte ordering/Endianness revisited

Reply via email to