Hi all,

working on SPARK-24051 I realized that currently in the Optimizer and in
all the places where we are transforming a query plan, we are lacking the
context information of what is in scope and what is not.

Coming back to the ticket, the bug reported in the ticket is caused mainly
by two reasons:
 1 - we have two aliases in different places of the plan;
 2 - (the focus of this email) we apply all the rules globally over the
whole plan, without any notion of scope where something is
reachable/visible or not.

I will start with an easy example to explain what I mean. If we have a
simple query like:

select a, b from (
  select 1 as a, 2 as b from table1
    union
  select 3 as a, 4 as b from table2) q

We produce a tree which is logically something like:

Project0(a, b)
-   Union
--    Project1 (a, b)
---     ScanTable1
--    Project 2(a, b)
---     ScanTable2

So when we apply a transformation on Project1 for instance, we have no
information about what is coming from ScanTable1 (or in general any node
which is part of the subtree whose root is Project1): we miss a stateful
transform which allows the children to tell the parent, grandparents, and
so on what is in their scope. This is in particular true for the
Attributes: in a node we have no idea if an Attribute comes from its
subtree (it is in scope) or not.

So, the point of this email is: do you think in general might be useful to
introduce a way of navigating the tree which allows the children to keep a
state to be used by their parents? Or do you think it is useful in general
to introduce the concept of scope (if an attribute can be accessed by a
node of a plan)?

Thanks,
Marco

Reply via email to