----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/42508/#review115804 -----------------------------------------------------------
Overall logic makes sense.. just some (maybe basic) questions below. ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCount.java (line 160) <https://reviews.apache.org/r/42508/#comment176868> Why do we need to do ArrayUtils.isEquals as well as hash comparison? And another suggestion, can we do something like ObjectInspectorUtils.compare like other UDAF's? ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCount.java (line 171) <https://reviews.apache.org/r/42508/#comment176869> Just curious how you know that Text/LazyString are the only ones that need a copy? Are there other data types that also need it? - Szehon Ho On Jan. 20, 2016, 5:05 p.m., Aihua Xu wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/42508/ > ----------------------------------------------------------- > > (Updated Jan. 20, 2016, 5:05 p.m.) > > > Review request for hive, Chaoyu Tang, Szehon Ho, and Xuefu Zhang. > > > Repository: hive-git > > > Description > ------- > > HIVE-12889: Support COUNT(DISTINCT) for partitioning query. > > > Diffs > ----- > > data/files/windowing_distinct.txt PRE-CREATION > > ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/functions/HiveSqlCountAggFunction.java > 7937040 > > ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/functions/HiveSqlSumAggFunction.java > 8f62970 > > ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/PlanModifierForASTConv.java > e2fbb4f > > ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/SqlFunctionConverter.java > 37249f9 > ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java 3fefbd7 > ql/src/java/org/apache/hadoop/hive/ql/parse/IdentifiersParser.g 15ca754 > ql/src/java/org/apache/hadoop/hive/ql/parse/PTFInvocationSpec.java 29b8510 > ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 15773e5 > ql/src/java/org/apache/hadoop/hive/ql/parse/WindowingSpec.java a181f7c > ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCount.java > eaf112e > ql/src/test/queries/clientpositive/windowing_distinct.q PRE-CREATION > ql/src/test/results/clientpositive/windowing_distinct.q.out PRE-CREATION > > Diff: https://reviews.apache.org/r/42508/diff/ > > > Testing > ------- > > Support count(distinct) over partitioning window. > > 1. Enabling the parser to properly parse such query "count(distinct) over > (partition by c1)"; > 2. ORDER BY and windowing frame won't work with the functions of distinct due > to performance concern and implementation requirement. > 3. We insert the distinct fields into the order by list, so during counting, > we only need to compare the current row against the previous remembered row. > > > Thanks, > > Aihua Xu > >