[ https://issues.apache.org/jira/browse/HIVE-5588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13803473#comment-13803473 ]
Sergey Shelukhin commented on HIVE-5588: ---------------------------------------- We have discussed a little bit (unfortunately after making the patch) and decided that we won't do it. It will add extra serialization overhead; and the problems that we were planning to solve (e.g. putting it in front of FileSink for HIVE-4002 case) can be solved better via different means. Moreover, given that sorting relies on BinarySortableSerDe, it will not work straightforwardly w/whatever serde FileSink is using, additional serde will need to be created. And there's no code in Hive to actually sort keys, w/o serde. I will attach the patch for reference. If needed in future it can be easily pushed thru. It already works correctly on most scenarios; when distinct columns are present there's an exception, small additional code duplication w/ReduceSink is needed to make it work. It is explained in "TODO#" comment where the code is not correct. > change TopN to be an operator > ----------------------------- > > Key: HIVE-5588 > URL: https://issues.apache.org/jira/browse/HIVE-5588 > Project: Hive > Issue Type: Task > Reporter: Sergey Shelukhin > Assignee: Sergey Shelukhin > > See HIVE-5503, as well as the discussion in HIVE-3562. > If topN is a separate operator, it can be reused for file sink, and > vectorized version can be implemented. -- This message was sent by Atlassian JIRA (v6.1#6144)