Let me add a bit more color since I'm the Shepherd. I've fixed quite some bugs in the analyzer due to rule order issues. The recent ones are https://github.com/apache/spark/pull/45718 and https://github.com/apache/spark/pull/45350 . Dealing with rule order is very tricky and making all the analyzer rules orthogonal is nearly impossible. This is definitely the right direction to follow other mainstream databases and use a single-pass analyzer.
This is a tough project and will likely take years. To reduce risks, it will not change the codebase invasively. The majority of the new analyzer will be in the new code files, and only minor refactorings are needed to reuse some existing analyzer rules. The new analyzer will only be enabled in the dedicated tests that will be newly built for this new analyzer, so you should never hit issues caused by the new analyzer in the existing tests. On Thu, Sep 19, 2024 at 5:01 PM Reynold Xin <r...@databricks.com.invalid> wrote: > Great document! Thanks for writing it up. > > On Tue, Sep 10, 2024 at 10:00 AM Vladimir Golubev <vvdr....@gmail.com> > wrote: > >> Hey folks, following up on the recent single-pass Analyzer discussion. I >> made a high-level proposal document for this idea: >> https://docs.google.com/document/d/1dWxvrJV-0joGdLtWbvJ0uNyTocDMJ90rPRNWa4T56Og. >> Feel free to comment! >> >