[ https://issues.apache.org/jira/browse/FLINK-10993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16700506#comment-16700506 ]
Stephan Ewen commented on FLINK-10993: -------------------------------------- If I understand the feature proposal correctly, the use of state is not the only part. other parts would be also how the building/merging works. Further more, should this work only for batch, for streaming, for both? > Bring bloomfilter as a public API > --------------------------------- > > Key: FLINK-10993 > URL: https://issues.apache.org/jira/browse/FLINK-10993 > Project: Flink > Issue Type: New Feature > Components: DataStream API > Reporter: vinoyang > Assignee: vinoyang > Priority: Major > > Flink internally provides an implementation of BloomFilter, but only for > internal optimization, and does not provide APIs for public access. > Here is a user mail discussion before : > http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Bloom-filter-in-Flink-td10608.html > Considering that many users have the need to "determine duplicates" in > streaming computing, I think it would make sense to provide such an API. > In addition, Spark has provided BloomFilter as a public API : > {code:java} > val bf = df.stat.bloomFilter("dd",dataLen,0.01) > val rightNum = rdd.map(x=>(x.toInt,bf.mightContainString(x))) > {code} > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)