Re: Big-Endian (IBM Power7) Spark Serialization issue

2014-06-17 Thread gchen
Cool, so maybe when we swith to Snappy instead of LZF, we can workaround the bug until the LZF upstream fix it, right? In addition, is it valuable to add support for other compression codecs such as LZ4? We observed 5% end-to-end improvement using LZ4 vs Snappy in Terasort (Hadoop MR). -- View

Re: Big-Endian (IBM Power7) Spark Serialization issue

2014-06-16 Thread gchen
I didn't find ning's source code in Spark git repository (or maybe I missed it?), so next time when we meet bug caused by third party code, can we do something (to fix the bug) based on the Spark repository? -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.

Re: Big-Endian (IBM Power7) Spark Serialization issue

2014-06-16 Thread gchen
Surely the community's kind support is essential:) -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Big-Endian-IBM-Power7-Spark-Serialization-issue-tp7003p7018.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

Re: Big-Endian (IBM Power7) Spark Serialization issue

2014-06-16 Thread gchen
Hi Reynold, thanks for your interest on this issue. The work here is part of incorporating Spark into PowerLinux ecosystem. Here is the bug raised in ning by my colleague: https://github.com/ning/compress/issues/37 Would you mind to share whether some insights of Spark's support for Big Enidan A

Re: Big-Endian (IBM Power7) Spark Serialization issue

2014-06-15 Thread gchen
To anyone who is interested in this issue, the root cause if from a third party code com.ning.compress.lzf.impl.UnsafeChunkEncoderBE class since they have a broken implementation. A bug will be raised in Ning project, thanks. -- View this message in context: http://apache-spark-developers-list.