Re: Why are nested aggregations illegal? Best alternatives?

Gavin Ray Wed, 23 Feb 2022 15:31:12 -0800

Okay, I think maybe I have an idea what is going on

The parser grammar for "JsonNameAndValue":
https://github.com/apache/calcite/blob/82dd78a14f6aef2eeec2f9c94978d04b4acc5359/core/src/main/codegen/templates/Parser.jj#L6267-L6296


Has:
e = Expression(ExprContext.ACCEPT_NON_QUERY)

This enum value appears to be the one that says it can't contain queries:
https://github.com/apache/calcite/blob/82dd78a14f6aef2eeec2f9c94978d04b4acc5359/core/src/main/java/org/apache/calcite/sql/parser/SqlAbstractParserImpl.java#L328-L331

    /**
     * Accept only non-query expressions in this context.
     */
    ACCEPT_NON_QUERY,

So it's not a bug, but Calcite's parser just doesn't take query expressions
for JSON values.

I guess that changing this might break a bunch of stuff, and probably is
very difficult?

On Wed, Feb 23, 2022 at 6:10 PM Gavin Ray <ray.gavi...@gmail.com> wrote:

> Sorry, I realized this might be a time investment to try to debug/reproduce
> So I found a website that let me create the schema, rows, and query on
> various DB's
>
> Here is an example on Oracle and MySQL that you can view/modify in your
> browser:
>
> ORACLE:
> https://dbfiddle.uk/?rdbms=oracle_21&fiddle=2eed69c44fa63adf9830213163ba73d0
> MYSQL:
> https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=f94c7957eae4f5ebe4c879c17fbe64ea
>
> On Wed, Feb 23, 2022 at 2:59 PM Gavin Ray <ray.gavi...@gmail.com> wrote:
>
>> I think this actually is a bug in Calcite's parser or it's interpretation
>> I tested on H2, Oracle, and MySQL, the below is valid on those DB's but
>> fails on Calcite's parser:
>>
>> =================
>> H2/Oracle
>> =================
>> SELECT
>>     "houses"."id" AS "id",
>>     "houses"."address" AS "address",
>>     (
>>         SELECT
>>             JSON_ARRAYAGG(JSON_OBJECT(
>>                 KEY 'id' VALUE "users"."id",
>>                 KEY 'name' VALUE "users"."name",
>>                 KEY 'todos' VALUE (
>>                     SELECT
>>                         JSON_ARRAYAGG(JSON_OBJECT(
>>                             KEY 'id' VALUE "todos"."id",
>>                             KEY 'description' VALUE "todos"."description"
>>                         ))
>>                     FROM
>>                         "todos"
>>                     WHERE
>>                         "todos"."user_id" = "users"."id"
>>                 )
>>             ))
>>         FROM
>>             "users"
>>         WHERE
>>             "users"."house_id" = "houses"."id"
>>     ) "users"
>> FROM
>>     "houses"
>> GROUP BY
>>     "houses"."id",
>>     "houses"."address"
>>
>> =================
>> MySQL
>> =================
>> SELECT
>>     `houses`.`id` AS `id`,
>>     `houses`.`address` AS `address`,
>>     (
>>         SELECT
>>             JSON_ARRAYAGG(JSON_OBJECT(
>>                 'id', `users`.`id`,
>>                 'name', `users`.`name`,
>>                 'todos', (
>>                     SELECT
>>                         JSON_ARRAYAGG(JSON_OBJECT(
>>                             'id', `todos`.`id`,
>>                             'description', `todos`.`description`
>>                         ))
>>                     FROM
>>                         `todos`
>>                     WHERE
>>                         `todos`.`user_id` = `users`.`id`
>>                 )
>>             ))
>>         FROM
>>             `users`
>>         WHERE
>>             `users`.`house_id` = `houses`.`id`
>>     ) `users`
>> FROM
>>     `houses`
>> GROUP BY
>>     `houses`.`id`,
>>     `houses`.`address`
>>
>> On Wed, Feb 23, 2022 at 12:52 PM Gavin Ray <ray.gavi...@gmail.com> wrote:
>>
>>> That's a valid point, let me see check what some common dialects will
>>> accept for this sort of thing
>>>
>>> On Wed, Feb 23, 2022 at 12:36 PM Julian Hyde <jhyde.apa...@gmail.com>
>>> wrote:
>>>
>>>> It’s a parser error. That usually means that the user has made a
>>>> mistake.
>>>>
>>>> Try your SQL on another DB with JSON support before you declare this a
>>>> Calcite bug.
>>>>
>>>> Julian
>>>>
>>>> > On Feb 23, 2022, at 09:22, Gavin Ray <ray.gavi...@gmail.com> wrote:
>>>> >
>>>> > No dice still unfortunately =/
>>>> >
>>>> > If it's any easier, I put a self-contained single class reproduction
>>>> on
>>>> > Github:
>>>> >
>>>> https://github.com/GavinRay97/calcite-nested-json-subquery-bug/blob/main/src/test/java/com/example/AppTest.java
>>>> >
>>>> > Is it worth filing a JIRA ticket over you think?
>>>> >
>>>> >
>>>> >> On Wed, Feb 23, 2022 at 3:45 AM Julian Hyde <jhyde.apa...@gmail.com>
>>>> wrote:
>>>> >>
>>>> >> Try ‘value ((‘ in place of ‘value (‘.
>>>> >>
>>>> >> Julian
>>>> >>
>>>> >>>> On Feb 21, 2022, at 9:33 AM, Gavin Ray <ray.gavi...@gmail.com>
>>>> wrote:
>>>> >>>
>>>> >>> I hadn't thought about the fact that ORM's probably have to solve
>>>> this
>>>> >>> problem as well
>>>> >>> That is a great suggestion, I will try to investigate some of the
>>>> popular
>>>> >>> ORM codebases and see if there are any tricks they are using.
>>>> >>>
>>>> >>> I seem to maybe be getting a tiny bit closer by using subqueries
>>>> like
>>>> >>> Julian suggested instead of operator calls
>>>> >>> But if I may ask what is probably a very stupid question:
>>>> >>>
>>>> >>> What might the error message
>>>> >>> "parse failed: Query expression encountered in illegal context
>>>> >>> (state=,code=0)"
>>>> >>>
>>>> >>> Mean in the below query?
>>>> >>>
>>>> >>> The reason why I am confused is because the query runs if I remove
>>>> the
>>>> >>> innermost subquery ("todos")
>>>> >>> But the innermost subquery is a direct copy-paste of the subquery
>>>> above
>>>> >> it,
>>>> >>> so I know it MUST be valid
>>>> >>>
>>>> >>> As usual, thank you so much for your help/guidance Stamatis.
>>>> >>>
>>>> >>> select
>>>> >>> "g0"."id" "id",
>>>> >>> "g0"."address" "address",
>>>> >>> (
>>>> >>>   select json_arrayagg(json_object(
>>>> >>>     key 'id' value "g1"."id",
>>>> >>>     key 'todos' value (
>>>> >>>       select json_arrayagg(json_object(
>>>> >>>         key 'id' value "g2"."id",
>>>> >>>         key 'description' value "g2"."description",
>>>> >>>       ))
>>>> >>>       from (
>>>> >>>         select * from "todos"
>>>> >>>         where "g1"."id" = "user_id"
>>>> >>>         order by "id"
>>>> >>>       ) "g2"
>>>> >>>     )
>>>> >>>   ))
>>>> >>>   from (
>>>> >>>     select * from "users"
>>>> >>>     where "g0"."id" = "house_id"
>>>> >>>     order by "id"
>>>> >>>   ) "g1"
>>>> >>> ) "users"
>>>> >>> from "houses" "g0"
>>>> >>> order by "g0"."id"
>>>> >>>
>>>> >>>> On Mon, Feb 21, 2022 at 8:07 AM Stamatis Zampetakis <
>>>> zabe...@gmail.com>
>>>> >>>> wrote:
>>>> >>>>
>>>> >>>> Hi Gavin,
>>>> >>>>
>>>> >>>> A few more comments in case they help to get you a bit further on
>>>> your
>>>> >>>> work.
>>>> >>>>
>>>> >>>> The need to return the result as a single object is a common
>>>> problem in
>>>> >>>> object relational mapping (ORM) frameworks/APIS (JPA, Datanucleus,
>>>> >>>> Hibernate, etc.). Apart from the suggestions so far maybe you
>>>> could look
>>>> >>>> into these frameworks as well for more inspiration.
>>>> >>>>
>>>> >>>> Moreover your approach of decomposing the query into individual
>>>> parts is
>>>> >>>> commonly known as the N+1 problem [1].
>>>> >>>>
>>>> >>>> Lastly, keep in mind that you can introduce custom UDF, UDAF
>>>> functions
>>>> >> if
>>>> >>>> you need more flexibility on reconstructing the final result.
>>>> >>>>
>>>> >>>> Best,
>>>> >>>> Stamatis
>>>> >>>>
>>>> >>>> [1]
>>>> >>>>
>>>> >>>>
>>>> >>
>>>> https://stackoverflow.com/questions/97197/what-is-the-n1-selects-problem-in-orm-object-relational-mapping
>>>> >>>>
>>>> >>>>> On Sun, Feb 13, 2022 at 3:59 AM Gavin Ray <ray.gavi...@gmail.com>
>>>> >> wrote:
>>>> >>>>>
>>>> >>>>> Ah wait nevermind, got excited and spoke too soon. Looking at it
>>>> more
>>>> >>>>> closely, that data isn't correct.
>>>> >>>>> At least it's in somewhat the right shape, ha!
>>>> >>>>>
>>>> >>>>>> On Sat, Feb 12, 2022 at 9:57 PM Gavin Ray <ray.gavi...@gmail.com
>>>> >
>>>> >> wrote:
>>>> >>>>>
>>>> >>>>>> After ~5 hours, I think I may have made some progress =)
>>>> >>>>>>
>>>> >>>>>> I have this, which currently works. The problem is that the
>>>> nested
>>>> >>>>> columns
>>>> >>>>>> don't have names on them.
>>>> >>>>>> Since I need to return a nested "Map<String, Object>", I have to
>>>> >> figure
>>>> >>>>>> out how to convert this query into a form that gives column
>>>> names.
>>>> >>>>>>
>>>> >>>>>> But this is still great progress I think!
>>>> >>>>>>
>>>> >>>>>> SELECT
>>>> >>>>>>   "todos".*,
>>>> >>>>>>   ARRAY(
>>>> >>>>>>       SELECT
>>>> >>>>>>           "users".*,
>>>> >>>>>>           ARRAY(
>>>> >>>>>>               SELECT
>>>> >>>>>>                   "todos".*
>>>> >>>>>>               FROM
>>>> >>>>>>                   "todos"
>>>> >>>>>>           ) AS "todos"
>>>> >>>>>>       FROM
>>>> >>>>>>           "users"
>>>> >>>>>>   ) AS "users"
>>>> >>>>>> FROM
>>>> >>>>>>   "todos"
>>>> >>>>>> WHERE
>>>> >>>>>>   "user_id" IN (
>>>> >>>>>>       SELECT
>>>> >>>>>>           "user_id"
>>>> >>>>>>       FROM
>>>> >>>>>>           "users"
>>>> >>>>>>       WHERE
>>>> >>>>>>           "house_id" IN (
>>>> >>>>>>               SELECT
>>>> >>>>>>                   "id"
>>>> >>>>>>               FROM
>>>> >>>>>>                   "houses"
>>>> >>>>>>           )
>>>> >>>>>>   );
>>>> >>>>>>
>>>> >>>>>>
>>>> >>>>>>
>>>> >>>>>>
>>>> >>>>>
>>>> >>>>
>>>> >>
>>>> +----+---------+------------------------+------------------------------------------------------------------------------+
>>>> >>>>>> | id | user_id |      description       |
>>>> >>>>>>                                           |
>>>> >>>>>>
>>>> >>>>>>
>>>> >>>>>
>>>> >>>>
>>>> >>
>>>> +----+---------+------------------------+------------------------------------------------------------------------------+
>>>> >>>>>> | 1  | 1       | Take out the trash     | [{1, John, 1, [{1, 1,
>>>> Take
>>>> >>>> out
>>>> >>>>>> the trash}, {2, 1, Watch my favorite show}, { |
>>>> >>>>>> | 2  | 1       | Watch my favorite show | [{1, John, 1, [{1, 1,
>>>> Take
>>>> >>>> out
>>>> >>>>>> the trash}, {2, 1, Watch my favorite show}, { |
>>>> >>>>>> | 3  | 1       | Charge my phone        | [{1, John, 1, [{1, 1,
>>>> Take
>>>> >>>> out
>>>> >>>>>> the trash}, {2, 1, Watch my favorite show}, { |
>>>> >>>>>> | 4  | 2       | Cook dinner            | [{1, John, 1, [{1, 1,
>>>> Take
>>>> >>>> out
>>>> >>>>>> the trash}, {2, 1, Watch my favorite show}, { |
>>>> >>>>>> | 5  | 2       | Read a book            | [{1, John, 1, [{1, 1,
>>>> Take
>>>> >>>> out
>>>> >>>>>> the trash}, {2, 1, Watch my favorite show}, { |
>>>> >>>>>> | 6  | 2       | Organize office        | [{1, John, 1, [{1, 1,
>>>> Take
>>>> >>>> out
>>>> >>>>>> the trash}, {2, 1, Watch my favorite show}, { |
>>>> >>>>>> | 7  | 3       | Walk the dog           | [{1, John, 1, [{1, 1,
>>>> Take
>>>> >>>> out
>>>> >>>>>> the trash}, {2, 1, Watch my favorite show}, { |
>>>> >>>>>> | 8  | 3       | Feed the cat           | [{1, John, 1, [{1, 1,
>>>> Take
>>>> >>>> out
>>>> >>>>>> the trash}, {2, 1, Watch my favorite show}, { |
>>>> >>>>>>
>>>> >>>>>>
>>>> >>>>>
>>>> >>>>
>>>> >>
>>>> +----+---------+------------------------+------------------------------------------------------------------------------+
>>>> >>>>>>
>>>> >>>>>> On Sat, Feb 12, 2022 at 4:13 PM Gavin Ray <ray.gavi...@gmail.com
>>>> >
>>>> >>>> wrote:
>>>> >>>>>>
>>>> >>>>>>> Nevermind, this is a standard term not something
>>>> Calcite-specific it
>>>> >>>>>>> seems!
>>>> >>>>>>>
>>>> >>>>>>> https://en.wikipedia.org/wiki/Correlated_subquery
>>>> >>>>>>>
>>>> >>>>>>> On Sat, Feb 12, 2022 at 3:46 PM Gavin Ray <
>>>> ray.gavi...@gmail.com>
>>>> >>>>> wrote:
>>>> >>>>>>>
>>>> >>>>>>>> Forgive my ignorance/lack of experience
>>>> >>>>>>>>
>>>> >>>>>>>> I am somewhat familiar with the ARRAY() function, but not sure
>>>> I
>>>> >> know
>>>> >>>>>>>> the term "correlated"
>>>> >>>>>>>> Searching the Calcite codebase for uses of "correlated" +
>>>> "query", I
>>>> >>>>>>>> found:
>>>> >>>>>>>>
>>>> >>>>>>>>
>>>> >>>>>>>>
>>>> >>>>>
>>>> >>>>
>>>> >>
>>>> https://github.com/apache/calcite/blob/1d4f1b394bfdba03c5538017e12ab2431b137ca9/core/src/test/java/org/apache/calcite/test/SqlToRelConverterTest.java#L1603-L1612
>>>> >>>>>>>>
>>>> >>>>>>>> @Test void testCorrelatedSubQueryInJoin() {
>>>> >>>>>>>>   final String sql = "select *\n"
>>>> >>>>>>>>       + "from emp as e\n"
>>>> >>>>>>>>       + "join dept as d using (deptno)\n"
>>>> >>>>>>>>       + "where d.name = (\n"
>>>> >>>>>>>>       + "  select max(name)\n"
>>>> >>>>>>>>       + "  from dept as d2\n"
>>>> >>>>>>>>       + "  where d2.deptno = d.deptno)";
>>>> >>>>>>>>   sql(sql).withExpand(false).ok();
>>>> >>>>>>>> }
>>>> >>>>>>>>
>>>> >>>>>>>> But I also see this, which says it is "uncorrelated" but seems
>>>> very
>>>> >>>>>>>> similar?
>>>> >>>>>>>>
>>>> >>>>>>>> @Test void testInUncorrelatedSubQuery() {
>>>> >>>>>>>>   final String sql = "select empno from emp where deptno in"
>>>> >>>>>>>>       + " (select deptno from dept)";
>>>> >>>>>>>>   sql(sql).ok();
>>>> >>>>>>>> }
>>>> >>>>>>>>
>>>> >>>>>>>> I wouldn't blame you for not answering such a basic question
>>>> -- but
>>>> >>>>> what
>>>> >>>>>>>> exactly does "correlation" mean here?
>>>> >>>>>>>>
>>>> >>>>>>>> Thanks, as usual Julian
>>>> >>>>>>>>
>>>> >>>>>>>>
>>>> >>>>>>>> On Sat, Feb 12, 2022 at 3:08 PM Julian Hyde <
>>>> jhyde.apa...@gmail.com
>>>> >>>
>>>> >>>>>>>> wrote:
>>>> >>>>>>>>
>>>> >>>>>>>>> Correlated ARRAY sub-query?
>>>> >>>>>>>>>
>>>> >>>>>>>>>> On Feb 12, 2022, at 10:40 AM, Gavin Ray <
>>>> ray.gavi...@gmail.com>
>>>> >>>>>>>>> wrote:
>>>> >>>>>>>>>>
>>>> >>>>>>>>>> Apologies for the delay in replying
>>>> >>>>>>>>>>
>>>> >>>>>>>>>> This makes things clear and seems obvious now that you point
>>>> it
>>>> >>>> out.
>>>> >>>>>>>>>> Thank you, Justin and Julian =)
>>>> >>>>>>>>>>
>>>> >>>>>>>>>> Let me ask another question (if I may) that I am struggling
>>>> to
>>>> >>>>> phrase
>>>> >>>>>>>>>> easily.
>>>> >>>>>>>>>>
>>>> >>>>>>>>>> So with GraphQL, you might have a query like:
>>>> >>>>>>>>>> - "Get houses"
>>>> >>>>>>>>>> - "For each house get the user that lives in the house
>>>> >>>>>>>>>> - "And for each user get their list of todos"
>>>> >>>>>>>>>>
>>>> >>>>>>>>>> The result has to come back such that it's a single object
>>>> for
>>>> >>>> each
>>>> >>>>>>>>> row
>>>> >>>>>>>>>> ==============
>>>> >>>>>>>>>> {
>>>> >>>>>>>>>> houses: [{
>>>> >>>>>>>>>>  address: "123 Main Street",
>>>> >>>>>>>>>>  users: [{
>>>> >>>>>>>>>>    name: "Joe",
>>>> >>>>>>>>>>    todos: [{
>>>> >>>>>>>>>>      description: "Take out trash"
>>>> >>>>>>>>>>    }]
>>>> >>>>>>>>>>  }]
>>>> >>>>>>>>>> }
>>>> >>>>>>>>>>
>>>> >>>>>>>>>> From a SQL perspective, the logical equivalent would be
>>>> something
>>>> >>>>>>>>> like:
>>>> >>>>>>>>>> ==============
>>>> >>>>>>>>>> SELECT
>>>> >>>>>>>>>>  house.address,
>>>> >>>>>>>>>>  (somehow nest users + double-nest todos under user)
>>>> >>>>>>>>>> FROM
>>>> >>>>>>>>>>  house
>>>> >>>>>>>>>> JOIN
>>>> >>>>>>>>>>  user ON user.house_id = house.id
>>>> >>>>>>>>>>  todos ON todos.user_id = user.id
>>>> >>>>>>>>>> WHERE
>>>> >>>>>>>>>>  house.id = 1
>>>> >>>>>>>>>>
>>>> >>>>>>>>>> I'm not familiar enough with SQL to have figured out a way
>>>> to make
>>>> >>>>>>>>> this
>>>> >>>>>>>>>> kind of
>>>> >>>>>>>>>> query using operators that are supported across most of the
>>>> DB's
>>>> >>>>>>>>> Calcite has
>>>> >>>>>>>>>> adapters for.
>>>> >>>>>>>>>>
>>>> >>>>>>>>>> Currently what I have done instead, on a tip from
>>>> Gopalakrishna
>>>> >>>>> Holla
>>>> >>>>>>>>> from
>>>> >>>>>>>>>> LinkedIn Coral team who has built GraphQL-on-Calcite, was to
>>>> break
>>>> >>>>> up
>>>> >>>>>>>>> the
>>>> >>>>>>>>>> query
>>>> >>>>>>>>>> into individual parts and then do the join in-memory:
>>>> >>>>>>>>>>
>>>> >>>>>>>>>> SELECT ... FROM users;
>>>> >>>>>>>>>> SELECT ... FROM todos WHERE todos.user_id IN (ResultSet from
>>>> prev
>>>> >>>>>>>>> response);
>>>> >>>>>>>>>>
>>>> >>>>>>>>>> However, the way I am doing this seems like it's probably
>>>> very
>>>> >>>>>>>>> inefficient.
>>>> >>>>>>>>>> Because I do a series of nested loops to add the key to each
>>>> >>>> object
>>>> >>>>>>>>> in the
>>>> >>>>>>>>>> parent ResultSet row:
>>>> >>>>>>>>>>
>>>> >>>>>>>>>>
>>>> >>>>>>>>>
>>>> >>>>>
>>>> >>>>
>>>> >>
>>>> https://github.com/GavinRay97/GraphQLCalcite/blob/648e0ac4f6810a3c360d13a03e6597c33406de4b/src/main/kotlin/TableDataFetcher.kt#L135-L153
>>>> >>>>>>>>>>
>>>> >>>>>>>>>> Is there some better way of doing this?
>>>> >>>>>>>>>> I would be eternally grateful for any advice.
>>>> >>>>>>>>>>
>>>> >>>>>>>>>>
>>>> >>>>>>>>>> On Thu, Feb 10, 2022 at 3:27 PM Julian Hyde <
>>>> >>>> jhyde.apa...@gmail.com
>>>> >>>>>>
>>>> >>>>>>>>> wrote:
>>>> >>>>>>>>>>
>>>> >>>>>>>>>>> Yes, if you want to do multiple layers of aggregation, use
>>>> CTEs
>>>> >>>>>>>>> (WITH) or
>>>> >>>>>>>>>>> nested sub-queries. For example, the following is I believe
>>>> valid
>>>> >>>>>>>>> standard
>>>> >>>>>>>>>>> SQL, and actually computes something useful:
>>>> >>>>>>>>>>>
>>>> >>>>>>>>>>> WITH q1 AS
>>>> >>>>>>>>>>> (SELECT deptno, job, AVG(sal) AS avg_sal
>>>> >>>>>>>>>>>  FROM emp
>>>> >>>>>>>>>>>  GROUP BY deptno, job)
>>>> >>>>>>>>>>> WITH q2 AS
>>>> >>>>>>>>>>> (SELECT deptno, AVG(avg_sal) AS avg_avg_sal
>>>> >>>>>>>>>>>  FROM q1
>>>> >>>>>>>>>>>  GROUP BY deptno)
>>>> >>>>>>>>>>> SELECT AVG(avg_avg_sal)
>>>> >>>>>>>>>>> FROM q2
>>>> >>>>>>>>>>> GROUP BY ()
>>>> >>>>>>>>>>>
>>>> >>>>>>>>>>> (You can omit the “GROUP BY ()” line, but I think it makes
>>>> things
>>>> >>>>>>>>> clearer.)
>>>> >>>>>>>>>>>
>>>> >>>>>>>>>>> Julian
>>>> >>>>>>>>>>>
>>>> >>>>>>>>>>>
>>>> >>>>>>>>>>>
>>>> >>>>>>>>>>>> On Feb 10, 2022, at 12:17 PM, Justin Swanhart <
>>>> >>>>> greenl...@gmail.com>
>>>> >>>>>>>>>>> wrote:
>>>> >>>>>>>>>>>>
>>>> >>>>>>>>>>>> I wish you could unsend emails :)  Answering my own
>>>> question,
>>>> >>>> no,
>>>> >>>>>>>>> because
>>>> >>>>>>>>>>>> that would return three rows with the average :D
>>>> >>>>>>>>>>>>
>>>> >>>>>>>>>>>> On Thu, Feb 10, 2022 at 3:16 PM Justin Swanhart <
>>>> >>>>>>>>> greenl...@gmail.com>
>>>> >>>>>>>>>>> wrote:
>>>> >>>>>>>>>>>>
>>>> >>>>>>>>>>>>> Just out of curiosity, is the second level aggregation
>>>> using
>>>> >>>> AVG
>>>> >>>>>>>>> in a
>>>> >>>>>>>>>>>>> window context?  It the frame is the whole table and it
>>>> >>>>> aggregates
>>>> >>>>>>>>> over
>>>> >>>>>>>>>>> it?
>>>> >>>>>>>>>>>>>
>>>> >>>>>>>>>>>>> On Thu, Feb 10, 2022 at 3:12 PM Justin Swanhart <
>>>> >>>>>>>>> greenl...@gmail.com>
>>>> >>>>>>>>>>>>> wrote:
>>>> >>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>> That is really neat about Oracle.
>>>> >>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>> The alternative in general is to use a subquery:
>>>> >>>>>>>>>>>>>> SELECT avg(avg(sal)) FROM emp GROUP BY deptno;
>>>> >>>>>>>>>>>>>> becomes
>>>> >>>>>>>>>>>>>> select avg(the_avg)
>>>> >>>>>>>>>>>>>> from (select avg(sal) from emp group b deptno) an_alias;
>>>> >>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>> or
>>>> >>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>> with the_cte as (select avg(sal) x from emp group by
>>>> deptno)
>>>> >>>>>>>>>>>>>> select avg(x) from the_cte;
>>>> >>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>> On Thu, Feb 10, 2022 at 3:03 PM Julian Hyde <
>>>> >>>>>>>>> jhyde.apa...@gmail.com>
>>>> >>>>>>>>>>>>>> wrote:
>>>> >>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>> Some databases, e.g. Oracle, allow TWO levels of
>>>> nesting:
>>>> >>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>> SELECT avg(sal) FROM emp GROUP BY deptno;
>>>> >>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>> AVG(SAL)
>>>> >>>>>>>>>>>>>>> ========
>>>> >>>>>>>>>>>>>>> 1,566.67
>>>> >>>>>>>>>>>>>>> 2,175.00
>>>> >>>>>>>>>>>>>>> 2,916.65
>>>> >>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>> SELECT avg(avg(sal)) FROM emp GROUP BY deptno;
>>>> >>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>> AVG(SUM(SAL))
>>>> >>>>>>>>>>>>>>> =============
>>>> >>>>>>>>>>>>>>>     9,675
>>>> >>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>> The first level aggregates by department (returning 3
>>>> >>>> records),
>>>> >>>>>>>>> and
>>>> >>>>>>>>>>> the
>>>> >>>>>>>>>>>>>>> second level computes the grand total (returning 1
>>>> record).
>>>> >>>> But
>>>> >>>>>>>>> that
>>>> >>>>>>>>>>> is an
>>>> >>>>>>>>>>>>>>> exceptional case.
>>>> >>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>> Generally, any expression in the SELECT or HAVING
>>>> clause of
>>>> >>>> an
>>>> >>>>>>>>>>> aggregate
>>>> >>>>>>>>>>>>>>> query is either ‘before’ or ‘after’ aggregation.
>>>> Consider
>>>> >>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>> SELECT t.x + 1 AS a, 2 + SUM(t.y + 3) AS b
>>>> >>>>>>>>>>>>>>> FROM t
>>>> >>>>>>>>>>>>>>> GROUP BY t.x
>>>> >>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>> The expressions “t.y” and “t.y + 3” occur before
>>>> aggregation;
>>>> >>>>>>>>> “t.x”,
>>>> >>>>>>>>>>>>>>> “t.x + 1”, “SUM(t.y + 3)” and “2 + SUM(t.y + 3)” occur
>>>> after
>>>> >>>>>>>>>>> aggregation.
>>>> >>>>>>>>>>>>>>> SQL semantics rely heavily on this stratification.
>>>> Allowing
>>>> >>>> an
>>>> >>>>>>>>> extra
>>>> >>>>>>>>>>> level
>>>> >>>>>>>>>>>>>>> of aggregation would mess it all up.
>>>> >>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>> Julian
>>>> >>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>> On Feb 10, 2022, at 9:45 AM, Justin Swanhart <
>>>> >>>>>>>>> greenl...@gmail.com>
>>>> >>>>>>>>>>>>>>> wrote:
>>>> >>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>> This is a SQL limitation.
>>>> >>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>> mysql> select sum(1);
>>>> >>>>>>>>>>>>>>>> +--------+
>>>> >>>>>>>>>>>>>>>> | sum(1) |
>>>> >>>>>>>>>>>>>>>> +--------+
>>>> >>>>>>>>>>>>>>>> |      1 |
>>>> >>>>>>>>>>>>>>>> +--------+
>>>> >>>>>>>>>>>>>>>> 1 row in set (0.00 sec)
>>>> >>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>> mysql> select sum(sum(1));
>>>> >>>>>>>>>>>>>>>> ERROR 1111 (HY000): Invalid use of group function
>>>> >>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>> On Thu, Feb 10, 2022 at 12:39 PM Gavin Ray <
>>>> >>>>>>>>> ray.gavi...@gmail.com>
>>>> >>>>>>>>>>>>>>> wrote:
>>>> >>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>> Went to test this query out and found that it can't be
>>>> >>>>>>>>> performed:
>>>> >>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>> SELECT
>>>> >>>>>>>>>>>>>>>>> JSON_OBJECT(
>>>> >>>>>>>>>>>>>>>>>    KEY 'users'
>>>> >>>>>>>>>>>>>>>>>    VALUE JSON_ARRAYAGG(
>>>> >>>>>>>>>>>>>>>>>        JSON_OBJECT(
>>>> >>>>>>>>>>>>>>>>>            KEY 'name' VALUE "users"."name",
>>>> >>>>>>>>>>>>>>>>>            KEY 'todos' VALUE JSON_ARRAYAGG(
>>>> >>>>>>>>>>>>>>>>>                JSON_OBJECT(
>>>> >>>>>>>>>>>>>>>>>                    KEY 'description' VALUE
>>>> >>>>>>>>> "todos"."description"
>>>> >>>>>>>>>>>>>>>>>                )
>>>> >>>>>>>>>>>>>>>>>            )
>>>> >>>>>>>>>>>>>>>>>        )
>>>> >>>>>>>>>>>>>>>>>    )
>>>> >>>>>>>>>>>>>>>>> )
>>>> >>>>>>>>>>>>>>>>> FROM
>>>> >>>>>>>>>>>>>>>>> "users"
>>>> >>>>>>>>>>>>>>>>> LEFT OUTER JOIN
>>>> >>>>>>>>>>>>>>>>> "todos" ON "users"."id" = "todos"."user_id";
>>>> >>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>> Checking the source, seems this is a blanket policy,
>>>> not a
>>>> >>>>>>>>>>>>>>>>> datasource-specific thing.
>>>> >>>>>>>>>>>>>>>>> From a functional perspective, it doesn't feel like
>>>> it's
>>>> >>>> much
>>>> >>>>>>>>>>>>>>> different
>>>> >>>>>>>>>>>>>>>>> from JOINs
>>>> >>>>>>>>>>>>>>>>> But I don't understand relational theory or DB
>>>> >>>> functionality
>>>> >>>>>>>>> in the
>>>> >>>>>>>>>>>>>>> least,
>>>> >>>>>>>>>>>>>>>>> so I'm not fit to judge.
>>>> >>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>> Just curious why Calcite doesn't allow this
>>>> >>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>
>>>> >>>>>>>>>>>
>>>> >>>>>>>>>
>>>> >>>>>>>>>
>>>> >>>>>
>>>> >>>>
>>>> >>
>>>>
>>>

Re: Why are nested aggregations illegal? Best alternatives?

Reply via email to