Re: Question on Spark 1.3 SQL External Datasource

Cheng Lian Tue, 17 Mar 2015 17:08:19 -0700

Hey Yang,

My comments are in-lined below.


Cheng

On 3/18/15 6:53 AM, Yang Lei wrote:

Hello,
I am migrating my Spark SQL external datasource integration from Spark1.2.x to Spark 1.3.
I noticed, there are a couple of new filters now, e.g.org.apache.spark.sql.sources.And. However, for a sql with condition "AAND B", I noticed PrunedFilteredScan.buildScan still gets anArray[Filter] with 2 filters of A and B, while I have expected to getone "And" filter with left == A and right == B.
So my first question is: where I can find out the "rules" forconverting a SQL condition to the filters passed tothe PrunedFilteredScan.buildScan.

Top level AND predicates are always broken into smaller sub-predicates.The AND filter appeared in the external data sources API is for nestedpredicates, like A OR (NOT (B AND C)).

I do like what I see on these And, Or, Not filters where we allowrecursive nested definition to connect filters together. If this isthe direction we are heading to, my second question is: if we justneed one Filter object instead of Array[Filter] on the buildScan.

For data sources with further filter push-down ability (e.g. Parquet),breaking down top level AND predicates for them can be convenient.

The third question is: what our plan is to allow a relation providerto inform Spark which filters are handled already, so that there isno redundant filtering.

Yeah, this is a good point, I guess we can add some method like"filterAccepted" to PrunedFilteredScan.


Appreciate comments and links to any existing documentation or discussion.


Yang

Re: Question on Spark 1.3 SQL External Datasource

Reply via email to