Re: Eliminate partition filters in execution.Filter after filter pruning

2015-04-14 Thread Michael Armbrust
The contract of the DataSources API is that filters are advisory and you are allowed to ignore them . This is why we always evaluate them ourselves. Have you benchmarked you chan

Re: Eliminate partition filters in execution.Filter after filter pruning

2015-04-14 Thread Yijie Shen
I’ve opened a PR on this: https://github.com/apache/spark/pull/5509 On April 14, 2015 at 11:57:34 AM, Yijie Shen (henry.yijies...@gmail.com) wrote: Hi, Suppose I have a table t(id: String, event: String) saved as parquet file, and have directory hierarchy:   hdfs://path/to/data/root/dt=2015-01-

Eliminate partition filters in execution.Filter after filter pruning

2015-04-13 Thread Yijie Shen
Hi, Suppose I have a table t(id: String, event: String) saved as parquet file, and have directory hierarchy:   hdfs://path/to/data/root/dt=2015-01-01/hr=00 After partition discovery, the result schema should be (id: String, event: String, dt: String, hr: Int) If I have a query like: df.select(