Pajaraja commented on code in PR #49955: URL: https://github.com/apache/spark/pull/49955#discussion_r1981202703
########## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala: ########## @@ -1032,6 +1044,42 @@ object ColumnPruning extends Rule[LogicalPlan] { p } + case p @ Project(_, ul: UnionLoop) => + if (!ul.outputSet.subsetOf(p.references)) { + val newAnchorChildProj = prunedChild(ul.anchor, p.references) + val neededIndicesListRef = { Review Comment: This doesn't work because at every step we need to produce something readable for the next iteration of UnionLoopRef (except if we truly do not need it). An example is the Fibonacci generation: ``` WITH RECURSIVE fibonacci AS ( VALUES (0, 1) AS t(a, b) UNION ALL SELECT b, a + b FROM fibonacci WHERE a < 10 ) SELECT a FROM fibonacci ORDER BY a; ``` This will fail with the above approach, as it will prune the recursive child to only return the next fibonacci number in the first iteration of the recursion. However, when trying to calculate the next fibonacci number, since we only memorized the last one, we won't be able to preform the calculation. Now as for my approach, its pretty similar, just that we merge two sets of projection references to obtain everything we need to keep: the project right above UnionLoop (which we also consider in your approach), OR a project right above UnionLoopRef (which may not exist in which case we don't prune at all). For this I create two sets of indices (one for each project) and merge them. Then I take make the pruned children in a way similar to what you suggested (I also started from modifying how pruning Union is handled). The reason I opted for sets of indices over sets of objects is because the columns down from UnionLoopRef might get renamed or given a different id in the process, but since the output of UnionLoopRef and UnionLoop should be of the same arity (and have the same types - they should logically correspond to each other!), the indices should correspond to the things we need accordingly. I do notice that I made a small mistake in creating indicesForRef which should be fixed now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org