Re: Review Request 42508: HIVE-12889: Support COUNT(DISTINCT) for partitioning query.

Szehon Ho Fri, 22 Jan 2016 00:04:21 -0800

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/42508/#review115804
-----------------------------------------------------------



Overall logic makes sense.. just some (maybe basic) questions below.


ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCount.java (line 
160)
<https://reviews.apache.org/r/42508/#comment176868>

    Why do we need to do ArrayUtils.isEquals as well as hash comparison?
    
    And another suggestion, can we do something like 
ObjectInspectorUtils.compare like other UDAF's?



ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCount.java (line 
171)
<https://reviews.apache.org/r/42508/#comment176869>

    Just curious how you know that Text/LazyString are the only ones that need 
a copy?  Are there other data types that also need it?


- Szehon Ho


On Jan. 20, 2016, 5:05 p.m., Aihua Xu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/42508/
> -----------------------------------------------------------
> 
> (Updated Jan. 20, 2016, 5:05 p.m.)
> 
> 
> Review request for hive, Chaoyu Tang, Szehon Ho, and Xuefu Zhang.
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> HIVE-12889: Support COUNT(DISTINCT) for partitioning query.
> 
> 
> Diffs
> -----
> 
>   data/files/windowing_distinct.txt PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/functions/HiveSqlCountAggFunction.java
>  7937040 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/functions/HiveSqlSumAggFunction.java
>  8f62970 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/PlanModifierForASTConv.java
>  e2fbb4f 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/SqlFunctionConverter.java
>  37249f9 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java 3fefbd7 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/IdentifiersParser.g 15ca754 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/PTFInvocationSpec.java 29b8510 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 15773e5 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/WindowingSpec.java a181f7c 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCount.java 
> eaf112e 
>   ql/src/test/queries/clientpositive/windowing_distinct.q PRE-CREATION 
>   ql/src/test/results/clientpositive/windowing_distinct.q.out PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/42508/diff/
> 
> 
> Testing
> -------
> 
> Support count(distinct) over partitioning window. 
> 
> 1. Enabling the parser to properly parse such query "count(distinct) over 
> (partition by c1)";
> 2. ORDER BY and windowing frame won't work with the functions of distinct due 
> to performance concern and implementation requirement.
> 3. We insert the distinct fields into the order by list, so during counting, 
> we only need to compare the current row against the previous remembered row.
> 
> 
> Thanks,
> 
> Aihua Xu
> 
>

Re: Review Request 42508: HIVE-12889: Support COUNT(DISTINCT) for partitioning query.

Reply via email to