Hi all, I have some doubts about the latest SparkSQL.
1. In the paper about SparkSQL it has been stated that "The physical planner also performs rule-based physical optimizations, such as pipelining projections or filters into one Spark map operation. ..." If dealing with a query of the form: select * from ( select * from tableA where date1 < '19-12-2015' )A where attribute1 = 'valueA' and attribute2 = 'valueB' Could I be sure that the both filters are applied sequentially in-memory i.e. first applying the date filter and over that result set, the next attributes filter gets applied? Or will two different Map-only operations will be spawned? 2. Does the Catalyst query optimizer is aware of how data was partitioned? or does it not make any assumptions on this? Thanks in advance! Renato M.