[ 
https://issues.apache.org/jira/browse/HIVE-22960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Magyar reassigned HIVE-22960:
------------------------------------


> Approximate TopN Key Operator
> -----------------------------
>
>                 Key: HIVE-22960
>                 URL: https://issues.apache.org/jira/browse/HIVE-22960
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive
>            Reporter: Attila Magyar
>            Assignee: Attila Magyar
>            Priority: Major
>             Fix For: 4.0.0
>
>         Attachments: Screen Shot 2020-03-02 at 4.55.46 PM.png
>
>
> ??Different from other operators, top n operator demonstrates the notable 
> “long tail” characteristics  which makes it distinct from other operators 
> like join, group by and etc.   will saturate very quickly. Update is pretty 
> frequent at the beginning and then diverges to a very slow update frequently.
> The approximation can be implemented in two ways: one way is to stop the 
> array/heap update after certain percentage of the data is been read, for 
> example, 10% or 20%, if we know the table size. The other way is to set a 
> frequency threshold of the array/heap update. After the threshold  is met, 
> then stop the top n processing.??
> [~rzhappy]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to