Re: [DISCUSS] [Spark SQL] A single-pass resolution approach for the Catalyst Analyzer

2024-08-26 Thread Wenchen Fan
+1. The analyzer rule order issue has bitten me multiple times and it's very hard to make your analyzer rule bug-free if it interacts with other rules. On Wed, Aug 21, 2024 at 2:49 AM Reynold Xin wrote: > +1 on this too > > When I implemented "group by all", I introduced at least two subtle bugs

Re: [DISCUSS] [Spark SQL] A single-pass resolution approach for the Catalyst Analyzer

2024-08-20 Thread Reynold Xin
+1 on this too When I implemented "group by all", I introduced at least two subtle bugs that many reviewers weren't able to catch and those two bugs would not have been possible to introduce if we had a single pass analyzer. Single pass can make the whole framework more robust. On Tue, Aug 2

[DISCUSS] [Spark SQL] A single-pass resolution approach for the Catalyst Analyzer

2024-08-20 Thread Xiao Li
This sounds like a good idea! The Analyzer is complex. The changes in the new Analyzer should not affect the existing one. The users could add the QO rules and rely on the existing structures and patterns of the logical plan trees generated by the current one. The new Analyzer needs to generate t

Re: [Spark SQL] A single-pass resolution approach for the Catalyst Analyzer

2024-08-14 Thread Vladimir Golubev
- I think we can rely on the current tests. One possibility would be to dual-run both Analyzer implementations if `Utils.isTesting` and compare the (normalized) logical plans - We can implement the analyzer functionality by milestones (Milestone 0: Project, Filter, UnresolvedInlineTable, Milestone

Re: [Spark SQL] A single-pass resolution approach for the Catalyst Analyzer

2024-08-14 Thread Herman van Hovell
+1(000) on this! This should massively reduce allocations done in the analyzer, and it is much more efficient. I also can't count the times that I had to increase the number of iterations. This sounds like a no-brainer to me. I do have two questions: - How do we ensure that we don't accidenta

[Spark SQL] A single-pass resolution approach for the Catalyst Analyzer

2024-08-09 Thread Vladimir Golubev
Hello All, I recently did some research in the Catalyst Analyzer area to check if it’s possible to make it single-pass instead of fixed-point. Despite the flexibility of the current fixed-point approach (new functionality - new rule), it has some drawbacks. The dependencies between the rules are u