Re: [DISCUSS] Support GROUP BY ALL (CALCITE-7594)

Tisya Bhatia via dev Wed, 10 Jun 2026 08:43:31 -0700

On the case where we're grouping only by constants over empty input:

I checked the SQL standard and major dialects; they're pretty unanimous
that the output should be 0 rows. Calcite already matches this because of
the AggregateProjectPullUpConstantsRule (it won't reduce a non-empty GROUP
BY to the empty one, it keeps at least one key), so GROUP BY ALL (which
expands to the explicit GROUP BY <constants> returns 0 over empty input. I
added an execution test to confirm this.


Best,
Tisya Bhatia

On Tue, Jun 9, 2026 at 1:58 PM Julian Hyde <[email protected]> wrote:

> > As for the 0-vs-1
> > row question, happy to leave that to the existing behavior (and the
> > language lawyers).
>
> There’s no existing behavior, so you need to decide what is the right
> behavior, and test it. See what the other implementations do.
>
> > On Jun 9, 2026, at 11:14 AM, Tisya Bhatia via dev <
> [email protected]> wrote:
> >
> > Hey Julian -
> >
> > Glad you're on board. Good point about the "contains an aggregate" vs.
> "is
> > an aggregate." My implementation (PR-pending) already does this: it
> expands
> > GROUP BY ALL to every SELECT expression that does not contain an
> aggregate,
> > using a recursive aggregate finder. So your example expands exactly as
> you
> > wrote.
> >
> > I'll also take note of measures - thanks, I'd missed that. And I'll add
> > your example as a test case.
> >
> > Regarding the degenerate case (grouping only by constants over an empty
> > input): GROUP BY ALL there expands to the exact same query as explicitly
> > writing GROUP BY 'x', so it introduces no new semantics - it inherits
> > whatever Calcite already does for constant-only grouping. As for the
> 0-vs-1
> > row question, happy to leave that to the existing behavior (and the
> > language lawyers).
> >
> > Thanks,
> > Tisya Bhatia
> >
> > On Tue, Jun 9, 2026 at 12:57 PM Julian Hyde <[email protected]
> <mailto:[email protected]>> wrote:
> >
> >> I agree; we should support GROUP BY ALL.
> >>
> >> “Isn’t an aggregate” should be broadened to something like “doesn’t
> >> contain an aggregate and isn’t a measure”.
> >>
> >>  SELECT deptno, SUBSTR(job, 1), count(*) + 1 as c, ‘x' as x
> >>  FROM emps
> >>  GROUP BY ALL
> >>
> >> should be equivalent to
> >>
> >>  SELECT deptno, SUBSTR(job, 1), count(*) + 1 as c, ‘x' as x
> >>  FROM emps
> >>  GROUP BY deptno, SUBSTR(job, 1), ‘x'
> >>
> >> I’m guessing that’s what the major implementations do. Measures are a
> >> Calcite-specific thing.
> >>
> >> Note that the constant ‘x’ gets swept into the GROUP BY clause but has
> no
> >> effect because it’s a constant.
> >> When there are no rows and we’re only grouping by constants we have a
> >> degenerate case and the language lawyers can argue about whether it
> should
> >> return 0 or 1 rows:
> >>
> >>  SELECT ‘x’ as x, COUNT(*)
> >>  FROM emps
> >>  WHERE FALSE
> >>  GROUP BY ALL
> >>
> >> Julian
> >>
> >>
> >>
> >>> On Jun 9, 2026, at 10:25 AM, Tisya Bhatia via dev <
> >> [email protected]> wrote:
> >>>
> >>> Hi all,
> >>>
> >>> I'd like to propose support for GROUP BY ALL in Calcite - a shorthand
> >> that
> >>> groups every expression in the SELECT clause that isn't an aggregate.
> >>>
> >>> Here is the Jira ticket:
> >>
> https://urldefense.com/v3/__https://issues.apache.org/jira/browse/CALCITE-7594__;!!Ayb5sqE7!pEba7l9-97xJOhAdqD6pEBkcgCqK3bHZKZBQDYL8-qHcbXxmWfNc0DmlEL7m8Qm9RF2bQ-92FfXAOcqWtWv7jdJL$
> >>>
> >>> Context:
> >>> - Today, if I want to group by every non-aggregated expression in the
> >>> SELECT list, I would have to repeat them in the group by.
> >>> - A GROUP BY ALL shorthand has become convergence across warehouse
> >>> dialects, and supporting it keeps Calcite-based dialects updated and
> >>> aligned.
> >>>
> >>> Proposal:
> >>> - Add GROUP BY ALL: when ALL appears with no grouping items, group by
> >> every
> >>> SELECT expression that is not an aggregate or window function. It's
> >>> implemented as a parser marker that the validator rewrites into the
> >>> concrete grouping expressions before normal group validation runs, so
> the
> >>> converter / optimizer never see a marker.
> >>> - Alternatives considered: I kept ALL (matches DuckDB, Spark, and
> >> others).
> >>> The existing ALL / DISTINCT set quantifier on grouping items
> >> (CALCITE-5089,
> >>> e.g. GROUP BY ALL CUBE(a, b)) is fully preserved. The new meaning only
> >>> applies when no grouping items follow ALL.
> >>> - Expected impact / tradeoff: purely additive. Bare GROUP BY ALL does
> not
> >>> parse today, so enabling it cannot change the meaning of any existing
> >>> query. SELECT * with GROUP BY ALL is rejected with a clear error for
> now
> >>> (the star isn't expanded at validation time)
> >>>
> >>> For example:
> >>>
> >>> SELECT deptno, job, SUM(sal)
> >>> FROM emp
> >>> GROUP BY ALL;
> >>>
> >>> resolves to
> >>>
> >>> SELECT deptno, job, SUM(sal)
> >>> FROM emp
> >>> GROUP BY deptno, job;
> >>>
> >>> Open questions:
> >>> - Since this feature is additive, I don't think a conformance flag is
> >>> needed, but would we prefer one?
> >>> - Should SELECT * be supported in this proposal, or is the explicit
> error
> >>> enough for an initial cut?
> >>>
> >>> If this direction sounds reasonable, I have a prototype ready (parser +
> >>> validator + tests + reference docs) and will put up the PR soon. A
> >>> companion ORDER BY ALL proposal (CALCITE-7597) is in the same vein.
> >>>
> >>> Best,
> >>> Tisya Bhatia
>
>

Re: [DISCUSS] Support GROUP BY ALL (CALCITE-7594)

Reply via email to