Re: [DISCUSS] [Spark SQL] Single-pass Analyzer SPIP

Wenchen Fan Tue, 24 Sep 2024 04:13:14 -0700

Let me add a bit more color since I'm the Shepherd.

I've fixed quite some bugs in the analyzer due to rule order issues. The
recent ones are https://github.com/apache/spark/pull/45718 and
https://github.com/apache/spark/pull/45350 . Dealing with rule order is
very tricky and making all the analyzer rules orthogonal is nearly
impossible. This is definitely the right direction to follow other
mainstream databases and use a single-pass analyzer.

This is a tough project and will likely take years. To reduce risks, it
will not change the codebase invasively. The majority of the new analyzer
will be in the new code files, and only minor refactorings are needed to
reuse some existing analyzer rules. The new analyzer will only be enabled
in the dedicated tests that will be newly built for this new analyzer, so
you should never hit issues caused by the new analyzer in the existing
tests.

On Thu, Sep 19, 2024 at 5:01 PM Reynold Xin <r...@databricks.com.invalid>
wrote:

> Great document! Thanks for writing it up.
>
> On Tue, Sep 10, 2024 at 10:00 AM Vladimir Golubev <vvdr....@gmail.com>
> wrote:
>
>> Hey folks, following up on the recent single-pass Analyzer discussion. I
>> made a high-level proposal document for this idea:
>> https://docs.google.com/document/d/1dWxvrJV-0joGdLtWbvJ0uNyTocDMJ90rPRNWa4T56Og.
>> Feel free to comment!
>>
>

Re: [DISCUSS] [Spark SQL] Single-pass Analyzer SPIP

Reply via email to