Chengxiang Li created FLINK-2533: ------------------------------------ Summary: Gap based random sample optimization Key: FLINK-2533 URL: https://issues.apache.org/jira/browse/FLINK-2533 Project: Flink Issue Type: Improvement Components: Core Reporter: Chengxiang Li Priority: Minor
For random sampler with fraction, like BernoulliSampler and PoissonSampler, Gap based random sampler could exploit O(k) sample implementation instead of previous O\(n\) sample implementation, it should perform better while sample fraction is very small. [This blog|http://erikerlandson.github.io/blog/2014/09/11/faster-random-samples-with-gap-sampling/] describes more detail about gap based random sampler. -- This message was sent by Atlassian JIRA (v6.3.4#6332)