Youngwb opened a new pull request #4803:
URL: https://github.com/apache/incubator-doris/pull/4803


   ## Proposed changes
   For #4674 
   This is a udaf for approximate topn using Space-Saving algorithm.  At 
present, we can only calculate the frequent items and their frequencies in a 
certain column, based on which we can implement similar topN functions 
supported by Kylin in the future. 
   
   I have also added a test to calculate the accuracy of this algorithm. The 
following is a rough running result. The total amount of data is 1 million 
lines and follows the Zipfian distribution, where Element Cardinality 
represents the data cardinality, 20X, 50X.. The value representing 
space_expand_rate is 20,50, which is used to set the counter number in the 
space-saving algorithm
   ```
   zf exponent = 0.5
   Element cardinality          20X        50X          100X
                  1000          100%       100%         100%
                  10000         100%       100%         100%
               100000           100%       100%         100%
               500000            94%        98%          99%
   
   zf exponent = 0.6,1
   Element cardinality          20X        50X          100X
                1000            100%       100%         100%
                10000           100%       100%         100%
                100000          100%       100%         100%
                500000          100%       100%         100%
   
   ```
   
   
   
   ## Types of changes
   
   What types of changes does your code introduce to Doris?
   _Put an `x` in the boxes that apply_
   
   - [] Bugfix (non-breaking change which fixes an issue)
   - [x] New feature (non-breaking change which adds functionality)
   - [] Breaking change (fix or feature that would cause existing functionality 
to not work as expected)
   - [] Documentation Update (if none of the other choices apply)
   - [] Code refactor (Modify the code structure, format the code, etc...)
   
   ## Checklist
   
   _Put an `x` in the boxes that apply. You can also fill these out after 
creating the PR. If you're unsure about any of them, don't hesitate to ask. 
We're here to help! This is simply a reminder of what we are going to look for 
before merging your code._
   
   - [x] I have create an issue on (Fix #ISSUE), and have described the 
bug/feature there in detail
   - [x] Compiling and unit tests pass locally with my changes
   - [x] I have added tests that prove my fix is effective or that my feature 
works
   - [x] If this change need a document change, I have updated the document
   - [] Any dependent changes have been merged
   
   ## Further comments
   
   If this is a relatively large or complex change, kick off the discussion at 
[email protected] by explaining why you chose the solution you did and what 
alternatives you considered, etc...
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to