Yu Xu created CALCITE-7517:
------------------------------

             Summary: Support DISTINCT ON clause in SELECT statements
                 Key: CALCITE-7517
                 URL: https://issues.apache.org/jira/browse/CALCITE-7517
             Project: Calcite
          Issue Type: Improvement
          Components: core
    Affects Versions: 1.41.0
            Reporter: Yu Xu
            Assignee: Yu Xu
             Fix For: 1.42.0


Currently Caclite not support DISTINCT ON syntax, such as:

 
{code:java}
SELECT DISTINCT ON (deptno) empno, ename
FROM emp
ORDER BY deptno, hiredate DESC;{code}
 

many mainstream databases support this syntax, such as Postgresql ClickHouse:

PostgreSql: https://www.postgresql.org/docs/18/sql-select.html

ClickHouse: 
[https://clickhouse.com/docs/sql-reference/statements/select/distinct]

Benefits of introducing {{DISTINCT ON}} syntax into Calcite:

*1. PostgreSQL Compatibility*

{{DISTINCT ON}} is a standard PostgreSQL extension. Supporting it allows 
Calcite-based systems (and downstream projects like Apache Flink, Hive, etc.) 
to parse and execute SQL written for PostgreSQL without manual rewrite.

*2.Cleaner Semantics for First-Row-per-Group Queries*

It solves the common "get the first row per partition" pattern directly and 
declaratively. Without {{{}DISTINCT ON{}}}, users must write verbose 
{{ROW_NUMBER() OVER (PARTITION BY ... ORDER BY ...)}} window expressions 
wrapped in a CTE or sub-query, which is harder to read and maintain.

*3.Reduced Boilerplate*

Compare the two approaches:
{code:java}
-- With DISTINCT ON (concise, intent is obvious)
SELECT DISTINCT ON (deptno) empno, ename
FROM emp
ORDER BY deptno, hiredate DESC;

-- Without it (verbose, error-prone)
WITH ranked AS (
  SELECT empno, ename,
    ROW_NUMBER() OVER (PARTITION BY deptno ORDER BY hiredate DESC) AS rn
  FROM emp
)
SELECT empno, ename FROM ranked WHERE rn = 1; {code}
*4.Better Integration with ORDER BY and LIMIT*

Because {{DISTINCT ON}} preserves the original {{ORDER BY}} semantics (it only 
deduplicates based on the specified prefix), {{LIMIT}} and {{OFFSET}} behave 
naturally after the deduplication step. The optimizer can also reason about the 
sort order more easily than with an opaque window-filter pattern.

*5. Easier for Query Builders and BI Tools*

Many tools generate SQL programmatically. Having a single clause for "keep the 
best match per group" simplifies code generation and avoids the complexity of 
synthesizing window functions and outer filters.

 

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to