[ 
https://issues.apache.org/jira/browse/HIVE-5588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13803473#comment-13803473
 ] 

Sergey Shelukhin commented on HIVE-5588:
----------------------------------------

We have discussed a little bit (unfortunately after making the patch) and 
decided that we won't do it.
It will add extra serialization overhead; and the problems that we were 
planning to solve (e.g. putting it in front of FileSink for HIVE-4002 case) can 
be solved better via different means. Moreover, given that sorting relies on 
BinarySortableSerDe, it will not work straightforwardly w/whatever serde 
FileSink is using, additional serde will need to be created. And there's no 
code in Hive to actually sort keys, w/o serde.

I will attach the patch for reference.
If needed in future it can be easily pushed thru.
It already works correctly on most scenarios; when distinct columns are present 
there's an exception, small additional code duplication w/ReduceSink is needed 
to make it work. It is explained in "TODO#" comment where the code is not 
correct.


> change TopN to be an operator
> -----------------------------
>
>                 Key: HIVE-5588
>                 URL: https://issues.apache.org/jira/browse/HIVE-5588
>             Project: Hive
>          Issue Type: Task
>            Reporter: Sergey Shelukhin
>            Assignee: Sergey Shelukhin
>
> See HIVE-5503, as well as the discussion in HIVE-3562.
> If topN is a separate operator, it can be reused for file sink, and 
> vectorized version can be implemented.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to