Hi, you might be interested in this newly-introduced feature: https://ci.apache.org/projects/flink/flink-docs-release-1.5/dev/stream/state/broadcast_state.html <https://ci.apache.org/projects/flink/flink-docs-release-1.5/dev/stream/state/broadcast_state.html>
Best, Aljoscha > On 5. Jun 2018, at 14:46, Amit Jain <aj201...@gmail.com> wrote: > > Hi Sandybayev, > > In the current state, Flink does not provide a solution to the > mentioned use case. However, there is open FLIP[1] [2] which has been > created to address the same. > > I can see in your current approach, you are not able to update the > rule set data. I think you can update rule set data by building > DataStream around changelogs which are stored in message > queue/distributed file system. > OR > You can store rule set data in the external system where you can query > for incoming keys from Flink. > > [1]: > https://cwiki.apache.org/confluence/display/FLINK/FLIP-17+Side+Inputs+for+DataStream+API > [2]: https://issues.apache.org/jira/browse/FLINK-6131 > > On Tue, Jun 5, 2018 at 5:52 PM, Sandybayev, Turar (CAI - Atlanta) > <turar.sandyba...@coxautoinc.com> wrote: >> Hi, >> >> >> >> What is the best practice recommendation for the following use case? We need >> to match a stream against a set of “rules”, which are essentially a Flink >> DataSet concept. Updates to this “rules set" are possible but not frequent. >> Each stream event must be checked against all the records in “rules set”, >> and each match produces one or more events into a sink. Number of records in >> a rule set are in the 6 digit range. >> >> >> >> Currently we're simply loading rules into a local List of rules and using >> flatMap over an incoming DataStream. Inside flatMap, we're just iterating >> over a list comparing each event to each rule. >> >> >> >> To speed up the iteration, we can also split the list into several batches, >> essentially creating a list of lists, and creating a separate thread to >> iterate over each sub-list (using Futures in either Java or Scala). >> >> >> >> Questions: >> >> 1. Is there a better way to do this kind of a join? >> >> 2. If not, is it safe to add additional parallelism by creating >> new threads inside each flatMap operation, on top of what Flink is already >> doing? >> >> >> >> Thanks in advance! >> >> Turar >> >>