Doubts about SparkSQL

Renato Marroquín Mogrovejo Sat, 23 May 2015 09:53:40 -0700

Hi all,

I have some doubts about the latest SparkSQL.


1. In the paper about SparkSQL it has been stated that "The physical
planner also performs rule-based physical optimizations, such as pipelining
projections or filters into one Spark map operation. ..."

If dealing with a query of the form:

select *  from (
          select * from tableA where date1 < '19-12-2015'
)A
where attribute1 = 'valueA' and attribute2 = 'valueB'

Could I be sure that the both filters are applied sequentially in-memory
i.e. first applying the date filter and over that result set, the next
attributes filter gets applied? Or will two different Map-only operations
will be spawned?

2. Does the Catalyst query optimizer is aware of how data was partitioned?
or does it not make any assumptions on this?
Thanks in advance!


Renato M.

Doubts about SparkSQL

Reply via email to