Attila Magyar created HIVE-22960:
------------------------------------
Summary: Approximate TopN Key Operator
Key: HIVE-22960
URL: https://issues.apache.org/jira/browse/HIVE-22960
Project: Hive
Issue Type: Bug
Components: Hive
Reporter: Attila Magyar
Assignee: Attila Magyar
Fix For: 4.0.0
Attachments: Screen Shot 2020-03-02 at 4.55.46 PM.png
??Different from other operators, top n operator demonstrates the notable “long
tail” characteristics which makes it distinct from other operators like join,
group by and etc. will saturate very quickly. Update is pretty frequent at
the beginning and then diverges to a very slow update frequently.
The approximation can be implemented in two ways: one way is to stop the
array/heap update after certain percentage of the data is been read, for
example, 10% or 20%, if we know the table size. The other way is to set a
frequency threshold of the array/heap update. After the threshold is met, then
stop the top n processing.??
[~rzhappy]
--
This message was sent by Atlassian Jira
(v8.3.4#803005)