[ https://issues.apache.org/jira/browse/HIVE-22960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Attila Magyar resolved HIVE-22960. ---------------------------------- Resolution: Won't Fix > Approximate TopN Key Operator > ----------------------------- > > Key: HIVE-22960 > URL: https://issues.apache.org/jira/browse/HIVE-22960 > Project: Hive > Issue Type: Bug > Components: Hive > Reporter: Attila Magyar > Assignee: Attila Magyar > Priority: Major > Fix For: 4.0.0 > > Attachments: Screen Shot 2020-03-02 at 4.55.46 PM.png > > > "Different from other operators, top n operator demonstrates the notable > “long tail” characteristics which makes it distinct from other operators like > join, group by and etc. will saturate very quickly. Update is pretty frequent > at the beginning and then diverges to a very slow update frequently. > The approximation can be implemented in two ways: one way is to stop the > array/heap update after certain percentage of the data is been read, for > example, 10% or 20%, if we know the table size. The other way is to set a > frequency threshold of the array/heap update. After the threshold is met, > then stop the top n processing" > [~rzhappy] > !Screen Shot 2020-03-02 at 4.55.46 PM.png|width=688,height=468! > Y: number of updates in every 100msec -- This message was sent by Atlassian Jira (v8.3.4#803005)