[ https://issues.apache.org/jira/browse/HIVE-5657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13813143#comment-13813143 ]
Sergey Shelukhin commented on HIVE-5657: ---------------------------------------- I refactored the code there because it was to be called from 2 places (vectorized and not) and some code that is TopN specific was inside ReducerSink, also similarly vectorized and non-vectorized path inside TopNHash would have a lot of code in common like serialization. Let me try to rebase this patch later today... > TopN produces incorrect results with count(distinct) > ---------------------------------------------------- > > Key: HIVE-5657 > URL: https://issues.apache.org/jira/browse/HIVE-5657 > Project: Hive > Issue Type: Bug > Reporter: Sergey Shelukhin > Assignee: Navis > Priority: Critical > Attachments: D13797.1.patch, D13797.2.patch, HIVE-5657.1.patch.txt, > example.patch > > > Attached patch illustrates the problem. > limit_pushdown test has various other cases of aggregations and distincts, > incl. count-distinct, that work correctly (that said, src dataset is bad for > testing these things because every count, for example, produces one record > only), so something must be special about this. > I am not very familiar with distinct- code and these nuances; if someone > knows a quick fix feel free to take this, otherwise I will probably start > looking next week. -- This message was sent by Atlassian JIRA (v6.1#6144)