[jira] [Updated] (HIVE-22960) Approximate TopN Key Operator

Attila Magyar (Jira) Mon, 02 Mar 2020 08:13:53 -0800


     [ 
https://issues.apache.org/jira/browse/HIVE-22960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Attila Magyar updated HIVE-22960:
---------------------------------
    Description: 
"Different from other operators, top n operator demonstrates the notable “long 
tail” characteristics which makes it distinct from other operators like join, 
group by and etc. will saturate very quickly. Update is pretty frequent at the 
beginning and then diverges to a very slow update frequently.

The approximation can be implemented in two ways: one way is to stop the 
array/heap update after certain percentage of the data is been read, for 
example, 10% or 20%, if we know the table size. The other way is to set a 
frequency threshold of the array/heap update. After the threshold is met, then 
stop the top n processing"

[~rzhappy]

!Screen Shot 2020-03-02 at 4.55.46 PM.png|width=688,height=468!

  was:
"Different from other operators, top n operator demonstrates the notable “long 
tail” characteristics  which makes it distinct from other operators like join, 
group by and etc.   will saturate very quickly. Update is pretty frequent at 
the beginning and then diverges to a very slow update frequently.

The approximation can be implemented in two ways: one way is to stop the 
array/heap update after certain percentage of the data is been read, for 
example, 10% or 20%, if we know the table size. The other way is to set a 
frequency threshold of the array/heap update. After the threshold  is met, then 
stop the top n processing"

[~rzhappy]

 !Screen Shot 2020-03-02 at 4.55.46 PM.png! 


> Approximate TopN Key Operator
> -----------------------------
>
>                 Key: HIVE-22960
>                 URL: https://issues.apache.org/jira/browse/HIVE-22960
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive
>            Reporter: Attila Magyar
>            Assignee: Attila Magyar
>            Priority: Major
>             Fix For: 4.0.0
>
>         Attachments: Screen Shot 2020-03-02 at 4.55.46 PM.png
>
>
> "Different from other operators, top n operator demonstrates the notable 
> “long tail” characteristics which makes it distinct from other operators like 
> join, group by and etc. will saturate very quickly. Update is pretty frequent 
> at the beginning and then diverges to a very slow update frequently.
> The approximation can be implemented in two ways: one way is to stop the 
> array/heap update after certain percentage of the data is been read, for 
> example, 10% or 20%, if we know the table size. The other way is to set a 
> frequency threshold of the array/heap update. After the threshold is met, 
> then stop the top n processing"
> [~rzhappy]
> !Screen Shot 2020-03-02 at 4.55.46 PM.png|width=688,height=468!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-22960) Approximate TopN Key Operator

Reply via email to