I don’t think the goal of always preserving column names through rewrite rules is achievable. Just consider what happens when two RelSets with different column names merge, or when we write to a summary table, or when we add project “shims” above and below an aggregate and those shims may or may not be flattened away.
We should of course try to make each rule smarter about preserving names. We have been doing this for years. But we need a plan B. During SQL generation we have the opportunity to work top-down, to generate the column names based on the names in the consumer. It’s much easier to do the right thing at that point. > On Jun 25, 2025, at 8:28 AM, Mihai Budiu <mbu...@gmail.com> wrote: > > I think the goal is not to preserve column names in all RelNodes when a plan > is rewritten; this is not even well-defined, since there is no 1-1 > correspondence between Rel nodes in a new plan and in an old plan. > > The goal is for every rewrite rule to produce a plan with the exact same > output ROW type, including column names. I think this goal is certainly > achievable; there should be an assert after rule application that this holds. > At the very least this can be done by inserting a Project which renames > columns to their original names. > > This is similar to https://issues.apache.org/jira/browse/CALCITE-7058 > "Decorrelator may produce different column names". I think that this goal can > even be achieved for the decorrelator itself. > > Mihai > > > ________________________________ > From: Julian Hyde <jhyde.apa...@gmail.com> > Sent: Wednesday, June 25, 2025 7:59 AM > To: dev@calcite.apache.org <dev@calcite.apache.org> > Subject: Re: [DISCUSS] Preserving Output Alias Names After RelNode > Optimization > > Preserving column names through the optimization process, and many rewrite > rules being applied, is very hard if not impossible. > > Instead I would approach this as the RelToSqlConverter does, and try to > produce the best/most concise/most human-readable SQL possible given a > RelRoot. Flattening the subquery that is generated to project/rename the > output columns of a Sort seems a reasonable thing to do for RelToSqlConverter > to do. My approach would be to write a unit test and torture the code until > it passes. :) > >> On Jun 25, 2025, at 3:07 AM, Yanjing Wang <zhuangzixiao...@gmail.com> wrote: >> >> Hi all, I'd like to discuss a challenge regarding the preservation of >> column aliases after RelNode optimization in Apache Calcite. Let me outline >> the specific problem and potential approaches. Problem Statement: >> Currently, when applying RelNode optimization rules, Calcite doesn't >> preserve the original output column aliases. While using RelRoot seems like >> a potential solution, it introduces complications when the optimal RelNode >> is a Sort or similar node. Consider this scenario with a best rel: ` SELECT >> column1, ... FROM ... ORDER BY column1 DESC ` If we try to preserve aliases >> using RelRoot, it might generate SQL like: ` SELECT column1 AS alias1, ... >> FROM ( SELECT ... ORDER BY column1 DESC ) t1 ` This transformation can >> break the ORDER BY clause functionality in many compute engines. Questions >> for Discussion: 1. Is there an existing mature solution in Calcite for >> maintaining output alias consistency after optimization? 2. If not, what >> would be the recommended approach when dealing with Sort nodes (or similar >> RelNodes) that could be affected by RelRoot-based alias preservation? 3. If >> we implement a new solution, should we introduce additional RelOptRules to >> optimize the resulting RelNode structure? This would ensure we maintain >> both alias consistency and query performance. >> I'd appreciate your thoughts and suggestions on this matter, especially >> from those who have encountered similar challenges. Best regards, Yanjing >> Wang