[ https://issues.apache.org/jira/browse/HIVE-5657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Phabricator updated HIVE-5657: ------------------------------ Attachment: D13797.2.patch navis updated the revision "HIVE-5657 [jira] TopN produces incorrect results with count(distinct)". 1. Minimized diff 2. Support multi-distinct cases Reviewers: JIRA REVISION DETAIL https://reviews.facebook.net/D13797 CHANGE SINCE LAST DIFF https://reviews.facebook.net/D13797?vs=42645&id=42747#toc AFFECTED FILES ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java ql/src/java/org/apache/hadoop/hive/ql/exec/TopNHash.java ql/src/java/org/apache/hadoop/hive/ql/io/HiveKey.java ql/src/java/org/apache/hadoop/hive/ql/optimizer/LimitPushdownOptimizer.java ql/src/test/queries/clientpositive/limit_pushdown.q ql/src/test/queries/clientpositive/limit_pushdown_negative.q ql/src/test/results/clientpositive/limit_pushdown.q.out ql/src/test/results/clientpositive/limit_pushdown_negative.q.out To: JIRA, navis Cc: sershe > TopN produces incorrect results with count(distinct) > ---------------------------------------------------- > > Key: HIVE-5657 > URL: https://issues.apache.org/jira/browse/HIVE-5657 > Project: Hive > Issue Type: Bug > Reporter: Sergey Shelukhin > Assignee: Navis > Priority: Critical > Attachments: D13797.1.patch, D13797.2.patch, example.patch, > HIVE-5657.1.patch.txt > > > Attached patch illustrates the problem. > limit_pushdown test has various other cases of aggregations and distincts, > incl. count-distinct, that work correctly (that said, src dataset is bad for > testing these things because every count, for example, produces one record > only), so something must be special about this. > I am not very familiar with distinct- code and these nuances; if someone > knows a quick fix feel free to take this, otherwise I will probably start > looking next week. -- This message was sent by Atlassian JIRA (v6.1#6144)