[ 
https://issues.apache.org/jira/browse/HIVE-16026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

slim bouguerra reassigned HIVE-16026:
-------------------------------------

    Assignee: slim bouguerra

> Generated query will timeout and/or kill the druid cluster.
> -----------------------------------------------------------
>
>                 Key: HIVE-16026
>                 URL: https://issues.apache.org/jira/browse/HIVE-16026
>             Project: Hive
>          Issue Type: Bug
>          Components: Druid integration
>            Reporter: slim bouguerra
>            Assignee: slim bouguerra
>            Priority: Major
>             Fix For: 3.0.0
>
>
> Grouping by `__time` and another dimension generate a query with granularity 
> NONE with an interval from 1970 to 3000. This will kill the druid cluster 
> because druid group by strategy will create cursor for every ms and there is 
> lot of milliseconds between 1970 and 3000. Hence such query can turn into a 
> select then do the group by within hive. This should only happen when we 
> don't know the `__time` granularity.
> {code}
> explain select `__time`, userid from login_druid group by `__time`, userid
>     > ;
> OK
> Plan optimized by CBO.
> Stage-0
>   Fetch Operator
>     limit:-1
>     Select Operator [SEL_1]
>       Output:["_col0","_col1"]
>       TableScan [TS_0]
>         
> Output:["__time","userid"],properties:{"druid.query.json":"{\"queryType\":\"groupBy\",\"dataSource\":\"druid_user_login\",\"granularity\":\"NONE\",\"dimensions\":[\"userid\"],\"limitSpec\":{\"type\":\"default\"},\"aggregations\":[{\"type\":\"longSum\",\"name\":\"dummy_agg\",\"fieldName\":\"dummy_agg\"}],\"intervals\":[\"1900-01-01T00:00:00.000Z/3000-01-01T00:00:00.000Z\"]}","druid.query.type":"groupBy"}
> {code}  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to