Thanks Cheng for the clarification. Looking forward to this new API mentioned below.
Yang Sent from my iPad > On Mar 17, 2015, at 8:05 PM, Cheng Lian <lian.cs....@gmail.com> wrote: > > Hey Yang, > > My comments are in-lined below. > > Cheng > >> On 3/18/15 6:53 AM, Yang Lei wrote: >> Hello, >> >> I am migrating my Spark SQL external datasource integration from Spark 1.2.x >> to Spark 1.3. >> >> I noticed, there are a couple of new filters now, e.g. >> org.apache.spark.sql.sources.And. However, for a sql with condition "A AND >> B", I noticed PrunedFilteredScan.buildScan still gets an Array[Filter] with >> 2 filters of A and B, while I have expected to get one "And" filter with >> left == A and right == B. >> >> So my first question is: where I can find out the "rules" for converting a >> SQL condition to the filters passed to the PrunedFilteredScan.buildScan. > Top level AND predicates are always broken into smaller sub-predicates. The > AND filter appeared in the external data sources API is for nested > predicates, like A OR (NOT (B AND C)). >> >> I do like what I see on these And, Or, Not filters where we allow recursive >> nested definition to connect filters together. If this is the direction we >> are heading to, my second question is: if we just need one Filter object >> instead of Array[Filter] on the buildScan. > For data sources with further filter push-down ability (e.g. Parquet), > breaking down top level AND predicates for them can be convenient. >> >> The third question is: what our plan is to allow a relation provider to >> inform Spark which filters are handled already, so that there is no >> redundant filtering. > Yeah, this is a good point, I guess we can add some method like > "filterAccepted" to PrunedFilteredScan. >> >> Appreciate comments and links to any existing documentation or discussion. >> >> >> Yang >