Yu Xu created CALCITE-7517:
------------------------------
Summary: Support DISTINCT ON clause in SELECT statements
Key: CALCITE-7517
URL: https://issues.apache.org/jira/browse/CALCITE-7517
Project: Calcite
Issue Type: Improvement
Components: core
Affects Versions: 1.41.0
Reporter: Yu Xu
Assignee: Yu Xu
Fix For: 1.42.0
Currently Caclite not support DISTINCT ON syntax, such as:
{code:java}
SELECT DISTINCT ON (deptno) empno, ename
FROM emp
ORDER BY deptno, hiredate DESC;{code}
many mainstream databases support this syntax, such as Postgresql ClickHouse:
PostgreSql: https://www.postgresql.org/docs/18/sql-select.html
ClickHouse:
[https://clickhouse.com/docs/sql-reference/statements/select/distinct]
Benefits of introducing {{DISTINCT ON}} syntax into Calcite:
*1. PostgreSQL Compatibility*
{{DISTINCT ON}} is a standard PostgreSQL extension. Supporting it allows
Calcite-based systems (and downstream projects like Apache Flink, Hive, etc.)
to parse and execute SQL written for PostgreSQL without manual rewrite.
*2.Cleaner Semantics for First-Row-per-Group Queries*
It solves the common "get the first row per partition" pattern directly and
declaratively. Without {{{}DISTINCT ON{}}}, users must write verbose
{{ROW_NUMBER() OVER (PARTITION BY ... ORDER BY ...)}} window expressions
wrapped in a CTE or sub-query, which is harder to read and maintain.
*3.Reduced Boilerplate*
Compare the two approaches:
{code:java}
-- With DISTINCT ON (concise, intent is obvious)
SELECT DISTINCT ON (deptno) empno, ename
FROM emp
ORDER BY deptno, hiredate DESC;
-- Without it (verbose, error-prone)
WITH ranked AS (
SELECT empno, ename,
ROW_NUMBER() OVER (PARTITION BY deptno ORDER BY hiredate DESC) AS rn
FROM emp
)
SELECT empno, ename FROM ranked WHERE rn = 1; {code}
*4.Better Integration with ORDER BY and LIMIT*
Because {{DISTINCT ON}} preserves the original {{ORDER BY}} semantics (it only
deduplicates based on the specified prefix), {{LIMIT}} and {{OFFSET}} behave
naturally after the deduplication step. The optimizer can also reason about the
sort order more easily than with an opaque window-filter pattern.
*5. Easier for Query Builders and BI Tools*
Many tools generate SQL programmatically. Having a single clause for "keep the
best match per group" simplifies code generation and avoids the complexity of
synthesizing window functions and outer filters.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)