Is there some transformation we'd want to apply to that tree, but can't because we have no concept of scope? It's already possible for a plan rule to traverse each node's subtree if it wants.
On Tue, Apr 24, 2018 at 10:18 AM, Marco Gaido <marcogaid...@gmail.com> wrote: > Hi all, > > working on SPARK-24051 I realized that currently in the Optimizer and in > all the places where we are transforming a query plan, we are lacking the > context information of what is in scope and what is not. > > Coming back to the ticket, the bug reported in the ticket is caused mainly > by two reasons: > 1 - we have two aliases in different places of the plan; > 2 - (the focus of this email) we apply all the rules globally over the > whole plan, without any notion of scope where something is > reachable/visible or not. > > I will start with an easy example to explain what I mean. If we have a > simple query like: > > select a, b from ( > select 1 as a, 2 as b from table1 > union > select 3 as a, 4 as b from table2) q > > We produce a tree which is logically something like: > > Project0(a, b) > - Union > -- Project1 (a, b) > --- ScanTable1 > -- Project 2(a, b) > --- ScanTable2 > > So when we apply a transformation on Project1 for instance, we have no > information about what is coming from ScanTable1 (or in general any node > which is part of the subtree whose root is Project1): we miss a stateful > transform which allows the children to tell the parent, grandparents, and > so on what is in their scope. This is in particular true for the > Attributes: in a node we have no idea if an Attribute comes from its > subtree (it is in scope) or not. > > So, the point of this email is: do you think in general might be useful to > introduce a way of navigating the tree which allows the children to keep a > state to be used by their parents? Or do you think it is useful in general > to introduce the concept of scope (if an attribute can be accessed by a > node of a plan)? > > Thanks, > Marco > > >