I have put in a PR on Parquet to support dictionaries when filters are pushed down, which should reduce binary conversion overhear when Spark pushes down string predicates on columns that are dictionary encoded.
https://github.com/apache/incubator-parquet-mr/pull/117 It's blocked at the moment as I part of my parquet build fails on my Mac due to issue getting thrift 0.7 installed. Installation instructions available on Parquet do not seem to work I think due to this issue https://issues.apache.org/jira/browse/THRIFT-2229 <https://issues.apache.org/jira/browse/THRIFT-2229>. This is not directly related to Spark but I wondered if anyone has got thrift 0.7 working on Mac Yosemite 10.0, or can suggest a work round. -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Optimize-encoding-decoding-strings-when-using-Parquet-tp10141p10617.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org