Youngwb opened a new pull request #4803:
URL: https://github.com/apache/incubator-doris/pull/4803
## Proposed changes
For #4674
This is a udaf for approximate topn using Space-Saving algorithm. At
present, we can only calculate the frequent items and their frequencies in a
certain column, based on which we can implement similar topN functions
supported by Kylin in the future.
I have also added a test to calculate the accuracy of this algorithm. The
following is a rough running result. The total amount of data is 1 million
lines and follows the Zipfian distribution, where Element Cardinality
represents the data cardinality, 20X, 50X.. The value representing
space_expand_rate is 20,50, which is used to set the counter number in the
space-saving algorithm
```
zf exponent = 0.5
Element cardinality 20X 50X 100X
1000 100% 100% 100%
10000 100% 100% 100%
100000 100% 100% 100%
500000 94% 98% 99%
zf exponent = 0.6,1
Element cardinality 20X 50X 100X
1000 100% 100% 100%
10000 100% 100% 100%
100000 100% 100% 100%
500000 100% 100% 100%
```
## Types of changes
What types of changes does your code introduce to Doris?
_Put an `x` in the boxes that apply_
- [] Bugfix (non-breaking change which fixes an issue)
- [x] New feature (non-breaking change which adds functionality)
- [] Breaking change (fix or feature that would cause existing functionality
to not work as expected)
- [] Documentation Update (if none of the other choices apply)
- [] Code refactor (Modify the code structure, format the code, etc...)
## Checklist
_Put an `x` in the boxes that apply. You can also fill these out after
creating the PR. If you're unsure about any of them, don't hesitate to ask.
We're here to help! This is simply a reminder of what we are going to look for
before merging your code._
- [x] I have create an issue on (Fix #ISSUE), and have described the
bug/feature there in detail
- [x] Compiling and unit tests pass locally with my changes
- [x] I have added tests that prove my fix is effective or that my feature
works
- [x] If this change need a document change, I have updated the document
- [] Any dependent changes have been merged
## Further comments
If this is a relatively large or complex change, kick off the discussion at
[email protected] by explaining why you chose the solution you did and what
alternatives you considered, etc...
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]