[ https://issues.apache.org/jira/browse/HIVE-5657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Phabricator updated HIVE-5657: ------------------------------ Attachment: D13797.1.patch navis requested code review of "HIVE-5657 [jira] TopN produces incorrect results with count(distinct)". Reviewers: JIRA HIVE-5657 TopN produces incorrect results with count(distinct) Attached patch illustrates the problem. limit_pushdown test has various other cases of aggregations and distincts, incl. count-distinct, that work correctly (that said, src dataset is bad for testing these things because every count, for example, produces one record only), so something must be special about this. I am not very familiar with distinct- code and these nuances; if someone knows a quick fix feel free to take this, otherwise I will probably start looking next week. TEST PLAN EMPTY REVISION DETAIL https://reviews.facebook.net/D13797 AFFECTED FILES ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java ql/src/java/org/apache/hadoop/hive/ql/exec/TopNHash.java ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorReduceSinkOperator.java ql/src/java/org/apache/hadoop/hive/ql/optimizer/LimitPushdownOptimizer.java ql/src/java/org/apache/hadoop/hive/ql/plan/ReduceSinkDesc.java ql/src/test/queries/clientpositive/limit_pushdown.q ql/src/test/queries/clientpositive/limit_pushdown_negative.q ql/src/test/results/clientpositive/limit_pushdown.q.out ql/src/test/results/clientpositive/limit_pushdown_negative.q.out serde/src/java/org/apache/hadoop/hive/serde2/KeySerializer.java serde/src/java/org/apache/hadoop/hive/serde2/binarysortable/BinarySortableSerDe.java serde/src/java/org/apache/hadoop/hive/serde2/binarysortable/OutputByteBuffer.java WHY DID I GET THIS EMAIL? https://reviews.facebook.net/herald/transcript/41811/ To: JIRA, navis > TopN produces incorrect results with count(distinct) > ---------------------------------------------------- > > Key: HIVE-5657 > URL: https://issues.apache.org/jira/browse/HIVE-5657 > Project: Hive > Issue Type: Bug > Reporter: Sergey Shelukhin > Assignee: Navis > Priority: Critical > Attachments: D13797.1.patch, example.patch, HIVE-5657.1.patch.txt > > > Attached patch illustrates the problem. > limit_pushdown test has various other cases of aggregations and distincts, > incl. count-distinct, that work correctly (that said, src dataset is bad for > testing these things because every count, for example, produces one record > only), so something must be special about this. > I am not very familiar with distinct- code and these nuances; if someone > knows a quick fix feel free to take this, otherwise I will probably start > looking next week. -- This message was sent by Atlassian JIRA (v6.1#6144)