Re: [DISCUSS] Support to group rows by column ordinals

Jeyhun Karimov Fri, 21 Jun 2024 10:15:01 -0700

Hi Sergey,

Thanks for your comments.


Could you please elaborate more on use cases of this feature?


IMHO, the main use-cases can be
- simplifying some SQL queries (e.g., with queries including many/long
column names)
- having consistency when refactoring (e.g., columns are renamed)
- handling calculated columns (e.g., for columns with large expressions)

Giving the fact that was already mentioned by Timo in the PR
> > In some DBMSs it is common to write GROUP BY 1 or ORDER BY 1 for global
> aggregation/sorting.
> and IMHO referencing by ordinals might be error prone if someone adds
> more columns in SELECT and forgets about ordinals.


Yes, I completely agree. I also expressed similar ideas about the cons of
the feature above.

Would it make sense to consider enabling reference by aliases as
> another option here?
> Or did I miss anything?


I think so. But this is orthogonal to this issue [1] (deciding whether to
provide the proposed feature or not), no?
WDYT?


Regards,
Jeyhun

[1] https://issues.apache.org/jira/browse/FLINK-34366


On Thu, Jun 20, 2024 at 1:27 PM Sergey Nuyanzin <snuyan...@gmail.com> wrote:

> Hey Jeyhun,
>
> Thanks for starting the discussion.
> Could you please elaborate more on use cases of this feature?
>
> The one that I see in FLINK-34366[1] is to simplify referencing to
> aliases in SELECT from GROUP BY
> (also potentially ORDER BY and HAVING). I wonder whether there is some
> other use cases where
> support of addressing by ordinals is required?
>
> I'm asking since SqlConformance in Calcite and as a result
> FlinkSqlConformance in Flink give ability
> to reference to aliases from SELECT by enabling it e.g. [2] where javadoc
> says
> >   * Whether to allow aliases from the {@code SELECT} clause to be used as
> >   * column names in the {@code GROUP BY} clause.
>
> Giving the fact that was already mentioned by Timo in the PR
> > In some DBMSs it is common to write GROUP BY 1 or ORDER BY 1 for global
> aggregation/sorting.
> and IMHO referencing by ordinals might be error prone if someone adds
> more columns in SELECT and forgets about ordinals.
>
> Would it make sense to consider enabling reference by aliases as
> another option here?
> Or did I miss anything?
>
> [1] https://issues.apache.org/jira/browse/FLINK-34366
> [2]
> https://github.com/apache/calcite/blob/c0a53f6b17daaca9d057e70d7fae0a0e9c2cd02a/core/src/main/java/org/apache/calcite/sql/validate/SqlConformance.java#L92-L103
>
> On Thu, Jun 20, 2024 at 12:12 PM Muhammet Orazov
> <mor+fl...@morazow.com.invalid> wrote:
> >
> > Hey Jeyhun,
> >
> > Thanks for bringing it up! +-1 from my side.
> >
> > Personally, I find this feature confusing, it feels always natural to
> > use
> > column names. SQL power users will ask for it, I have seen it used in
> > automated complex queries also.
> >
> > But it seems counterintuitive to enable flag for this feature. Enabling
> > it, should not disable grouping/ordering by the column names?
> >
> > Best,
> > Muhammet
> >
> >
> > On 2024-06-17 20:30, Jeyhun Karimov wrote:
> > > Hi devs,
> > >
> > > I am moving our discussion on the PR thread [1] to the dev mailing list
> > > to
> > > close the loop on the related issue [2]. The end goal of the PR is to
> > > support grouping/ordering by via column ordinals. The target
> > > implementation
> > > (currently there is no flag) should support a flag, so that a user can
> > > also
> > > use the old behavior as suggested by @Timo.
> > >
> > > Some vendors such as Postgres [3], SQLite [4], MySQL/MariaDB [5],
> > > Oracle
> > > [6], Spark [7], and BigQuery[8] support group/order by clauses with
> > > column
> > > ordinals.
> > >
> > > Obviously, supporting this clause might lead to less readable and
> > > maintainable SQL code. This might also cause a bit of complications
> > > both on
> > > the codebase and on the user-experience side. On the other hand, we
> > > already
> > > see that numerous vendors support this feature out of the box, because
> > > there was/is a need for this feature.
> > >
> > > That is why, I would like to discuss and hear your opinions about
> > > introducing/abandoning this feature.
> > >
> > > Regards,
> > > Jeyhun
> > >
> > > [1] https://github.com/apache/flink/pull/24270
> > > [2] https://issues.apache.org/jira/browse/FLINK-34366
> > > [3] https://www.postgresql.org/docs/6.5/sql-select.htm
> > > [4] https://www.sqlite.org/lang_select.html
> > > [5] https://www.db-fiddle.com/f/uTrfRrNs4uXLr4Q9j2piCk/1
> > > [6]
> > >
> https://oracle-base.com/articles/23/group-by-and-having-clause-using-column-alias-or-column-position-23
> > > [7]
> > >
> https://github.com/apache/spark/commit/90613df652d45e121ab2b3a5bbb3b63cb15d297a
> > > [8]
> > >
> https://cloud.google.com/bigquery/docs/reference/standard-sql/query-syntax#group_by_col_ordinals
>
>
>
> --
> Best regards,
> Sergey
>

Re: [DISCUSS] Support to group rows by column ordinals

Reply via email to