Re: Modify Calcite Planner in Hive to remove GROUP BY

Vineet Garg Wed, 26 Jun 2019 19:04:13 -0700

Hi Julian,

You are right it should produce zero rows not NULL. Thanks for the
correction.


Vineet


On Wed, Jun 26, 2019 at 4:49 PM Julian Hyde <[email protected]> wrote:

> > Select count(*) from empty_table group by <constant> will produce NULL
>
> Really? I thought it should produce zero rows.
>
> Hsqldb:
>
> > select count(*) from "foodmart"."days" where false group by true;
> +-----------------+
> |       C1        |
> +-----------------+
> +-----------------+
> No rows selected (0.001 seconds)
>
>
> Julian
>
>
> > On Jun 26, 2019, at 1:12 PM, Vineet Garg <[email protected]> wrote:
> >
> > Hello Krzysztof,
> >
> > The rewrite you mention in Hive was done in HIVE-19674
> > <https://issues.apache.org/jira/browse/HIVE-19674> to be able to push
> such
> > group by to Druid. Currently there is no way to disable this rewrite.
> >
> > As for removing Group by <constant>, there are rules/rewrites which can
> > reduce grouping keys by removing constants but removing whole group by is
> > not safe since it can lead to semantically different query.
> > e.g. Select count(*) from empty_table group by <constant> will produce
> NULL
> > but Select count(*) from empty_table will produce 0.
> >
> > P.S. There was a bug in HIVE-19674' patch which was further fixed by
> > HIVE-21539 <https://issues.apache.org/jira/browse/HIVE-21539>.
> >
> > Regards,
> > Vineet Garg
> >
> > On Wed, Jun 26, 2019 at 7:08 AM Haisheng Yuan <[email protected]>
> > wrote:
> >
> >> Calcite has the rule that does the work. But you can't remove the group
> by
> >> clause if the constant is the only group key. The semantic is different
> >> without group key. Try it on empty relation, you will see the
> difference.
> >>
> >>
> >>
> >>
> >>
> >> Thanks~
> >> Haisheng
> >> Yuan------------------------------------------------------------------
> >> 发件人：Krzysztof Zarzycki<[email protected]>
> >> 日 期：2019年06月26日 21:52:41
> >> 收件人：<[email protected]>
> >> 主 题：Modify Calcite Planner in Hive to remove GROUP BY <constant>
> >>
> >> Hello,
> >>
> >> While the question I have might look like regards to Hive, I believe is
> >> more about Calcite. I need to add a Calcite plan rule to Hive, that
> removes
> >> "Group by" clause when it groups by some constant value (GROUP BY TRUE
> more
> >> precisely). As far as I believe, the query semantically is the same.
> >> Could anyone on this mailing list help me how to do it properly? While
> I'm
> >> an experienced java engineer, I have no clue how to achieve this.
> >> I was trying to modify hive code to do this myself, but unfortunately I
> got
> >> only NullPointerExceptions.
> >>
> >>
> >> More context below:
> >> I want to use JdbcStorageHandler in Hive, that connects to Apache Kylin
> and
> >> forward queries there. Then I put Tableau on top of Hive. Unfortunately,
> >> the queries produced by Tableau to Hive and then reproduced by Calcite
> >> Planner to Kylin, cannot be handled by Kylin (which BTW uses Calcite as
> >> well). I disabled some of the hive optimizations which fixed some of my
> >> queries. But I'm stuck on one I cannot disable. Tableau generates a
> query
> >> with "GROUP BY 1.000000...01" , that is translated to "GROUP BY TRUE",
> by
> >> Hive/Calcite. But neither of those can be handled by Kylin. I got an
> idea
> >> that I will remove GROUP BY completely, because in my understanding it's
> >> unecessary.
> >>
> >> I will be very grateful for your help,
> >> Kind Regards,
> >> Krzysztof
> >>
> >>
>
>

Re: Modify Calcite Planner in Hive to remove GROUP BY

Reply via email to