[ https://issues.apache.org/jira/browse/FLINK-10993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16698824#comment-16698824 ]
vinoyang commented on FLINK-10993: ---------------------------------- [~fhueske] Yes, the goal of that issue looks very similar to my idea. Does the community have a specific plan for that issue? I can assist in reviewing it. I think it would be great to see it in Flink 1.8. > Bring bloomfilter as a public API > --------------------------------- > > Key: FLINK-10993 > URL: https://issues.apache.org/jira/browse/FLINK-10993 > Project: Flink > Issue Type: New Feature > Components: DataStream API > Reporter: vinoyang > Assignee: vinoyang > Priority: Major > > Flink internally provides an implementation of BloomFilter, but only for > internal optimization, and does not provide APIs for public access. > Here is a user mail discussion before : > http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Bloom-filter-in-Flink-td10608.html > Considering that many users have the need to "determine duplicates" in > streaming computing, I think it would make sense to provide such an API. > In addition, Spark has provided BloomFilter as a public API : > {code:java} > val bf = df.stat.bloomFilter("dd",dataLen,0.01) > val rightNum = rdd.map(x=>(x.toInt,bf.mightContainString(x))) > {code} > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)