Michael,

What I'm seeing (in Spark 1.2.0) is that the required columns being pushed
down to the DataRelation are not the product of the SELECT clause but
rather just the columns explicitly included in the WHERE clause.

Examples from my testing:

SELECT * FROM myTable --> The required columns are empty.
SELECT key1 FROM myTable --> The required columns are empty
SELECT * FROM myTable where key1 = 'val1' --> The required columns contains
key1.
SELECT key1,key2 FROM myTable where key1 = 'val1' --> The required columns
contains key1
SELECT key1,key2 FROM myTable where key1 = 'val1' and key2 = 'val2' --> The
required columns cintains key1,key2



I created SPARK-5296 for the OR predicate to be pushed down in some
capacity.







On Sat, Jan 17, 2015 at 3:38 PM, Michael Armbrust <mich...@databricks.com>
wrote:

> 1) The fields in the SELECT clause are not pushed down to the predicate
>> pushdown API. I have many optimizations that allow fields to be filtered
>> out before the resulting object is serialized on the Accumulo tablet
>> server. How can I get the selection information from the execution plan?
>> I'm a little hesitant to implement the data relation that allows me to see
>> the logical plan because it's noted in the comments that it could change
>> without warning.
>>
>
> I'm not sure I understand.  The list of required columns should be pushed
> down to the data source.  Are you looking for something more complicated?
>
>
>> 2) I'm surprised to find that the predicate pushdown filters get
>> completely removed when I do anything more complex in a where clause other
>> than simple AND statements. Using an OR statement caused the filter array
>> that was passed into the PrunedFilteredDataSource to be empty.
>>
>
> This was just an initial cut at the set of predicates to push down.  We
> can add Or.  Mind opening a JIRA?
>

Reply via email to