[ https://issues.apache.org/jira/browse/FLINK-10993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16701783#comment-16701783 ]
vinoyang commented on FLINK-10993: ---------------------------------- [~StephanEwen] In my opinion, the PR of FLINK-8601 has done a lot of work, I intend to pick it up to my local, review it and even make the necessary adjustments and try it in our business. Do you have any better ideas? cc [~fhueske] > Bring bloomfilter as a public API > --------------------------------- > > Key: FLINK-10993 > URL: https://issues.apache.org/jira/browse/FLINK-10993 > Project: Flink > Issue Type: New Feature > Components: DataStream API > Reporter: vinoyang > Assignee: vinoyang > Priority: Major > > Flink internally provides an implementation of BloomFilter, but only for > internal optimization, and does not provide APIs for public access. > Here is a user mail discussion before : > http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Bloom-filter-in-Flink-td10608.html > Considering that many users have the need to "determine duplicates" in > streaming computing, I think it would make sense to provide such an API. > In addition, Spark has provided BloomFilter as a public API : > {code:java} > val bf = df.stat.bloomFilter("dd",dataLen,0.01) > val rightNum = rdd.map(x=>(x.toInt,bf.mightContainString(x))) > {code} > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)