Thanks Cheng for the clarification. 

Looking forward to this new API mentioned below. 

Yang

Sent from my iPad

> On Mar 17, 2015, at 8:05 PM, Cheng Lian <lian.cs....@gmail.com> wrote:
> 
> Hey Yang,
> 
> My comments are in-lined below.
> 
> Cheng
> 
>> On 3/18/15 6:53 AM, Yang Lei wrote:
>> Hello, 
>> 
>> I am migrating my Spark SQL external datasource integration from Spark 1.2.x 
>> to Spark 1.3. 
>> 
>> I noticed, there are a couple of new filters now,  e.g. 
>> org.apache.spark.sql.sources.And. However, for a sql with condition "A AND 
>> B", I noticed PrunedFilteredScan.buildScan still gets an Array[Filter] with 
>> 2 filters of A and B, while I have expected to get one "And" filter with 
>> left == A and right == B.
>> 
>> So my first question is: where I can find out the "rules" for converting a 
>> SQL condition to the filters passed to the PrunedFilteredScan.buildScan.
> Top level AND predicates are always broken into smaller sub-predicates. The 
> AND filter appeared in the external data sources API is for nested 
> predicates, like A OR (NOT (B AND C)).
>> 
>> I do like what I see on these And, Or, Not filters where we allow recursive 
>> nested definition to connect filters together. If this is the direction we 
>> are heading to, my second question is:  if we just need one Filter object 
>> instead of Array[Filter] on the buildScan.
> For data sources with further filter push-down ability (e.g. Parquet), 
> breaking down top level AND predicates for them can be convenient.
>> 
>> The third question is: what our plan is to allow a relation provider to 
>> inform Spark which filters are handled already, so that there is no 
>> redundant filtering.
> Yeah, this is a good point, I guess we can add some method like 
> "filterAccepted" to PrunedFilteredScan.
>> 
>> Appreciate comments and links to any existing documentation or discussion.
>> 
>> 
>> Yang
> 

Reply via email to