Michael Haeusler created HIVE-11022:
---------------------------------------

             Summary: Support collecting lists in user defined order
                 Key: HIVE-11022
                 URL: https://issues.apache.org/jira/browse/HIVE-11022
             Project: Hive
          Issue Type: New Feature
          Components: UDF
            Reporter: Michael Haeusler


Hive currently supports aggregation of lists "in order of input rows" with the 
UDF collect_list. Unfortunately, the order is not well defined when map-side 
aggregations are used.

Hive could support collecting lists in user-defined order by providing a UDF
COLLECT_LIST_SORTED(valueColumn, sortColumn[, limit]), that would return a list 
of values sorted in a user defined order. An optional limit parameter can 
restrict this to the n first values within that order.

Especially in the limit case, this can be efficiently pre-aggregated and reduce 
the amount of data transferred to reducers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to