Re: [DISCUSS] Support GROUP BY ALL (CALCITE-7594)

Julian Hyde Wed, 10 Jun 2026 11:18:28 -0700
+1

> On Jun 10, 2026, at 8:42 AM, Tisya Bhatia via dev <[email protected]> 
> wrote:
> 
> On the case where we're grouping only by constants over empty input:
> 
> I checked the SQL standard and major dialects; they're pretty unanimous
> that the output should be 0 rows. Calcite already matches this because of
> the AggregateProjectPullUpConstantsRule (it won't reduce a non-empty GROUP
> BY to the empty one, it keeps at least one key), so GROUP BY ALL (which
> expands to the explicit GROUP BY <constants> returns 0 over empty input. I
> added an execution test to confirm this.
> 
> Best,
> Tisya Bhatia
> 
> On Tue, Jun 9, 2026 at 1:58 PM Julian Hyde <[email protected] 
> <mailto:[email protected]>> wrote:
> 
>>> As for the 0-vs-1
>>> row question, happy to leave that to the existing behavior (and the
>>> language lawyers).
>> 
>> There’s no existing behavior, so you need to decide what is the right
>> behavior, and test it. See what the other implementations do.
>> 
>>> On Jun 9, 2026, at 11:14 AM, Tisya Bhatia via dev <
>> [email protected]> wrote:
>>> 
>>> Hey Julian -
>>> 
>>> Glad you're on board. Good point about the "contains an aggregate" vs.
>> "is
>>> an aggregate." My implementation (PR-pending) already does this: it
>> expands
>>> GROUP BY ALL to every SELECT expression that does not contain an
>> aggregate,
>>> using a recursive aggregate finder. So your example expands exactly as
>> you
>>> wrote.
>>> 
>>> I'll also take note of measures - thanks, I'd missed that. And I'll add
>>> your example as a test case.
>>> 
>>> Regarding the degenerate case (grouping only by constants over an empty
>>> input): GROUP BY ALL there expands to the exact same query as explicitly
>>> writing GROUP BY 'x', so it introduces no new semantics - it inherits
>>> whatever Calcite already does for constant-only grouping. As for the
>> 0-vs-1
>>> row question, happy to leave that to the existing behavior (and the
>>> language lawyers).
>>> 
>>> Thanks,
>>> Tisya Bhatia
>>> 
>>> On Tue, Jun 9, 2026 at 12:57 PM Julian Hyde <[email protected]
>> <mailto:[email protected]>> wrote:
>>> 
>>>> I agree; we should support GROUP BY ALL.
>>>> 
>>>> “Isn’t an aggregate” should be broadened to something like “doesn’t
>>>> contain an aggregate and isn’t a measure”.
>>>> 
>>>> SELECT deptno, SUBSTR(job, 1), count(*) + 1 as c, ‘x' as x
>>>> FROM emps
>>>> GROUP BY ALL
>>>> 
>>>> should be equivalent to
>>>> 
>>>> SELECT deptno, SUBSTR(job, 1), count(*) + 1 as c, ‘x' as x
>>>> FROM emps
>>>> GROUP BY deptno, SUBSTR(job, 1), ‘x'
>>>> 
>>>> I’m guessing that’s what the major implementations do. Measures are a
>>>> Calcite-specific thing.
>>>> 
>>>> Note that the constant ‘x’ gets swept into the GROUP BY clause but has
>> no
>>>> effect because it’s a constant.
>>>> When there are no rows and we’re only grouping by constants we have a
>>>> degenerate case and the language lawyers can argue about whether it
>> should
>>>> return 0 or 1 rows:
>>>> 
>>>> SELECT ‘x’ as x, COUNT(*)
>>>> FROM emps
>>>> WHERE FALSE
>>>> GROUP BY ALL
>>>> 
>>>> Julian
>>>> 
>>>> 
>>>> 
>>>>> On Jun 9, 2026, at 10:25 AM, Tisya Bhatia via dev <
>>>> [email protected]> wrote:
>>>>> 
>>>>> Hi all,
>>>>> 
>>>>> I'd like to propose support for GROUP BY ALL in Calcite - a shorthand
>>>> that
>>>>> groups every expression in the SELECT clause that isn't an aggregate.
>>>>> 
>>>>> Here is the Jira ticket:
>>>> 
>> https://urldefense.com/v3/__https://issues.apache.org/jira/browse/CALCITE-7594__;!!Ayb5sqE7!pEba7l9-97xJOhAdqD6pEBkcgCqK3bHZKZBQDYL8-qHcbXxmWfNc0DmlEL7m8Qm9RF2bQ-92FfXAOcqWtWv7jdJL$
>>>>> 
>>>>> Context:
>>>>> - Today, if I want to group by every non-aggregated expression in the
>>>>> SELECT list, I would have to repeat them in the group by.
>>>>> - A GROUP BY ALL shorthand has become convergence across warehouse
>>>>> dialects, and supporting it keeps Calcite-based dialects updated and
>>>>> aligned.
>>>>> 
>>>>> Proposal:
>>>>> - Add GROUP BY ALL: when ALL appears with no grouping items, group by
>>>> every
>>>>> SELECT expression that is not an aggregate or window function. It's
>>>>> implemented as a parser marker that the validator rewrites into the
>>>>> concrete grouping expressions before normal group validation runs, so
>> the
>>>>> converter / optimizer never see a marker.
>>>>> - Alternatives considered: I kept ALL (matches DuckDB, Spark, and
>>>> others).
>>>>> The existing ALL / DISTINCT set quantifier on grouping items
>>>> (CALCITE-5089,
>>>>> e.g. GROUP BY ALL CUBE(a, b)) is fully preserved. The new meaning only
>>>>> applies when no grouping items follow ALL.
>>>>> - Expected impact / tradeoff: purely additive. Bare GROUP BY ALL does
>> not
>>>>> parse today, so enabling it cannot change the meaning of any existing
>>>>> query. SELECT * with GROUP BY ALL is rejected with a clear error for
>> now
>>>>> (the star isn't expanded at validation time)
>>>>> 
>>>>> For example:
>>>>> 
>>>>> SELECT deptno, job, SUM(sal)
>>>>> FROM emp
>>>>> GROUP BY ALL;
>>>>> 
>>>>> resolves to
>>>>> 
>>>>> SELECT deptno, job, SUM(sal)
>>>>> FROM emp
>>>>> GROUP BY deptno, job;
>>>>> 
>>>>> Open questions:
>>>>> - Since this feature is additive, I don't think a conformance flag is
>>>>> needed, but would we prefer one?
>>>>> - Should SELECT * be supported in this proposal, or is the explicit
>> error
>>>>> enough for an initial cut?
>>>>> 
>>>>> If this direction sounds reasonable, I have a prototype ready (parser +
>>>>> validator + tests + reference docs) and will put up the PR soon. A
>>>>> companion ORDER BY ALL proposal (CALCITE-7597) is in the same vein.
>>>>> 
>>>>> Best,
>>>>> Tisya Bhatia
Re: [DISCUSS] Support GROUP BY ALL (CALCITE-7594)

Reply via email to