On 12 Jan 2016, at 10:49, Reynold Xin <r...@databricks.com<mailto:r...@databricks.com>> wrote:
How big of a deal this use case is in a heterogeneous endianness environment? If we do want to fix it, we should do it when right before Spark shuffles data to minimize performance penalty, i.e. turn big-endian encoded data into little-indian encoded data before it goes on the wire. This is a pretty involved change and given other things that might break across heterogeneous endianness environments, I am not sure if it is high priority enough to even warrant review bandwidth right now. It's notable that Hadoop doesn't like mixed-endianness; there is work (primarily from Oracle) to have consistent byteswapping —that is: work reliably on big-endian systems https://issues.apache.org/jira/browse/HADOOP-11505 ). There's no motivation to support mixed-endian clusters. The majority of clusters x86, there's only 3 cpu families that are little endian: Spark, Power, Arm. Adam has clearly been playing with Power + x86, but I'd suspect that's experimentation, not production. What is probably worth checking is mixed endian-ness between client apps submitting work and the servers: Java and Kryo serialization should handle that automatically.