This Parquet bug only triggers when there exists some row groups which
are either empty or contain only null binary values.
So it’s still safe to turn it on if data types of all columns are
boolean, numeric, and non-null binaries.
You may turn it on by |SET spark.sql.parquet.filterPushdown=tr
Michael,
Thanks. Is this still turned off in the released 1.2? Is it possible to
turn it on just to get an idea of how much of a difference it makes?
-Jerry
On 05/12/14 12:40 am, Michael Armbrust wrote:
I'll add that some of our data formats will actual infer this sort of
useful information a
I'll add that some of our data formats will actual infer this sort of
useful information automatically. Both parquet and cached inmemory tables
keep statistics on the min/max value for each column. When you have
predicates over these sorted columns, partitions will be eliminated if they
can't pos
You can try to write your own Relation with filter push down or use the
ParquetRelation2 for workaround.
(https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/parquet/newParquet.scala)
Cheng Hao
-Original Message-
From: Jerry Raj [mailto:jerry@gma