Re: Spark SQL with a sorted file

2014-12-23 Thread Cheng Lian
This Parquet bug only triggers when there exists some row groups which are either empty or contain only null binary values. So it’s still safe to turn it on if data types of all columns are boolean, numeric, and non-null binaries. You may turn it on by |SET spark.sql.parquet.filterPushdown=tr

Re: Spark SQL with a sorted file

2014-12-22 Thread Jerry Raj
Michael, Thanks. Is this still turned off in the released 1.2? Is it possible to turn it on just to get an idea of how much of a difference it makes? -Jerry On 05/12/14 12:40 am, Michael Armbrust wrote: I'll add that some of our data formats will actual infer this sort of useful information a

Re: Spark SQL with a sorted file

2014-12-04 Thread Michael Armbrust
I'll add that some of our data formats will actual infer this sort of useful information automatically. Both parquet and cached inmemory tables keep statistics on the min/max value for each column. When you have predicates over these sorted columns, partitions will be eliminated if they can't pos

RE: Spark SQL with a sorted file

2014-12-03 Thread Cheng, Hao
You can try to write your own Relation with filter push down or use the ParquetRelation2 for workaround. (https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/parquet/newParquet.scala) Cheng Hao -Original Message- From: Jerry Raj [mailto:jerry@gma