Re: Tungsten in a mixed endian environment

Steve Loughran Tue, 12 Jan 2016 13:13:58 -0800

On 12 Jan 2016, at 10:49, Reynold Xin 
<r...@databricks.com<mailto:r...@databricks.com>> wrote:


How big of a deal this use case is in a heterogeneous endianness environment? 
If we do want to fix it, we should do it when right before Spark shuffles data 
to minimize performance penalty, i.e. turn big-endian encoded data into 
little-indian encoded data before it goes on the wire. This is a pretty 
involved change and given other things that might break across heterogeneous 
endianness environments, I am not sure if it is high priority enough to even 
warrant review bandwidth right now.




It's notable that Hadoop doesn't like mixed-endianness; there is work 
(primarily from Oracle) to have consistent byteswapping —that is: work reliably 
on big-endian systems  https://issues.apache.org/jira/browse/HADOOP-11505 ). 
There's no motivation to support mixed-endian clusters.


The majority of clusters x86, there's only 3 cpu families that are little 
endian: Spark, Power, Arm. Adam has clearly been playing with Power + x86, but 
I'd suspect that's experimentation, not production.

What is probably worth checking is mixed endian-ness between client apps 
submitting work and the servers: Java and Kryo serialization should handle that 
automatically.

Re: Tungsten in a mixed endian environment

Reply via email to