Re: Modify Calcite Planner in Hive to remove GROUP BY

Krzysztof Zarzycki Tue, 09 Jul 2019 01:26:45 -0700

Thanks for your explanation, it helps a lot!
I was in a big mistake thinking that the result should be the same.
Possibly then Tableau puts "group by <constant>" intentionally, to receive
zero rows.


But that means... It is an even bigger bug/miss that Apache Kylin does not
handle grouping by constant. And so I'm afraid I cannot do anything on
Calcite level (like rewrite), I need to work on Kylin. (Or someone has a
different idea?)
I will raise an issue on Kylin Jira then.

Krzysztof



czw., 27 cze 2019 o 04:03 Vineet Garg <[email protected]> napisał(a):

> Hi Julian,
>
> You are right it should produce zero rows not NULL. Thanks for the
> correction.
>
> Vineet
>
>
> On Wed, Jun 26, 2019 at 4:49 PM Julian Hyde <[email protected]> wrote:
>
> > > Select count(*) from empty_table group by <constant> will produce NULL
> >
> > Really? I thought it should produce zero rows.
> >
> > Hsqldb:
> >
> > > select count(*) from "foodmart"."days" where false group by true;
> > +-----------------+
> > |       C1        |
> > +-----------------+
> > +-----------------+
> > No rows selected (0.001 seconds)
> >
> >
> > Julian
> >
> >
> > > On Jun 26, 2019, at 1:12 PM, Vineet Garg <[email protected]> wrote:
> > >
> > > Hello Krzysztof,
> > >
> > > The rewrite you mention in Hive was done in HIVE-19674
> > > <https://issues.apache.org/jira/browse/HIVE-19674> to be able to push
> > such
> > > group by to Druid. Currently there is no way to disable this rewrite.
> > >
> > > As for removing Group by <constant>, there are rules/rewrites which can
> > > reduce grouping keys by removing constants but removing whole group by
> is
> > > not safe since it can lead to semantically different query.
> > > e.g. Select count(*) from empty_table group by <constant> will produce
> > NULL
> > > but Select count(*) from empty_table will produce 0.
> > >
> > > P.S. There was a bug in HIVE-19674' patch which was further fixed by
> > > HIVE-21539 <https://issues.apache.org/jira/browse/HIVE-21539>.
> > >
> > > Regards,
> > > Vineet Garg
> > >
> > > On Wed, Jun 26, 2019 at 7:08 AM Haisheng Yuan <[email protected]>
> > > wrote:
> > >
> > >> Calcite has the rule that does the work. But you can't remove the
> group
> > by
> > >> clause if the constant is the only group key. The semantic is
> different
> > >> without group key. Try it on empty relation, you will see the
> > difference.
> > >>
> > >>
> > >>
> > >>
> > >>
> > >> Thanks~
> > >> Haisheng
> > >> Yuan------------------------------------------------------------------
> > >> 发件人：Krzysztof Zarzycki<[email protected]>
> > >> 日 期：2019年06月26日 21:52:41
> > >> 收件人：<[email protected]>
> > >> 主 题：Modify Calcite Planner in Hive to remove GROUP BY <constant>
> > >>
> > >> Hello,
> > >>
> > >> While the question I have might look like regards to Hive, I believe
> is
> > >> more about Calcite. I need to add a Calcite plan rule to Hive, that
> > removes
> > >> "Group by" clause when it groups by some constant value (GROUP BY TRUE
> > more
> > >> precisely). As far as I believe, the query semantically is the same.
> > >> Could anyone on this mailing list help me how to do it properly? While
> > I'm
> > >> an experienced java engineer, I have no clue how to achieve this.
> > >> I was trying to modify hive code to do this myself, but unfortunately
> I
> > got
> > >> only NullPointerExceptions.
> > >>
> > >>
> > >> More context below:
> > >> I want to use JdbcStorageHandler in Hive, that connects to Apache
> Kylin
> > and
> > >> forward queries there. Then I put Tableau on top of Hive.
> Unfortunately,
> > >> the queries produced by Tableau to Hive and then reproduced by Calcite
> > >> Planner to Kylin, cannot be handled by Kylin (which BTW uses Calcite
> as
> > >> well). I disabled some of the hive optimizations which fixed some of
> my
> > >> queries. But I'm stuck on one I cannot disable. Tableau generates a
> > query
> > >> with "GROUP BY 1.000000...01" , that is translated to "GROUP BY TRUE",
> > by
> > >> Hive/Calcite. But neither of those can be handled by Kylin. I got an
> > idea
> > >> that I will remove GROUP BY completely, because in my understanding
> it's
> > >> unecessary.
> > >>
> > >> I will be very grateful for your help,
> > >> Kind Regards,
> > >> Krzysztof
> > >>
> > >>
> >
> >
>

Re: Modify Calcite Planner in Hive to remove GROUP BY

Reply via email to