Hi Nick,

I think you can do this with Flink quite similar to how it is explained in
the Samza documentation by using a stateful CoFlatMapFunction [1], [2].

Please have a look at this snippet [3].
This code implements an updateable stream filter. The first stream is
filtered by words from the second stream. The filter operator adds or
removes words to/from the filter which are received from the second stream.
Both flows are partitioned by the filter word (or join key) such that each
parallel task instance is only responsible for a subset of the filter
words.

Please let me know if you have questions.

Best,
Fabian

[1]
https://ci.apache.org/projects/flink/flink-docs-master/apis/streaming_guide.html#transformations
[2]
https://ci.apache.org/projects/flink/flink-docs-master/apis/streaming_guide.html#checkpointing-local-variables
[3] https://gist.github.com/fhueske/4ea5422edb5820915fa4
<https://ci.apache.org/projects/flink/flink-docs-master/apis/streaming_guide.html#using-the-keyvalue-state-interface>



2015-11-10 19:02 GMT+01:00 Nick Dimiduk <ndimi...@gmail.com>:

> Hello,
>
> I'm interested in implementing a table/stream join, very similar to what
> is described in the "Table-stream join" section of the Samza key-value
> state documentation [0]. Conceptually, this would be an extension of the
> example provided in the javadocs for RichFunction#open [1], where I have a
> dataset of searchStrings instead of a single one. As per the Samza
> explanation, I would like to receive updates to this dataset via an
> operation log (a la, kafka topic), so as to update my local state while the
> streaming job runs.
>
> Perhaps you can further advise on parallelization strategy for this
> operation. It seems to me that I'd want to partition the searchString
> database across multiple parallelization units and broadcast my input
> datastream to all those units. The idea being to maximize throughput on
> available hardware, though I would expect there to be a limit at which the
> network plane becomes a bottleneck to the broadcast.
>
> Is there an example of how I might implement this in Flink-Streaming? I
> thought perhaps the DataStream#cross transformation would work, but I
> haven't worked out how to use it to my purpose. Thus far, I'm using the
> Java API.
>
> Thanks a lot!
> -n
>
> [0]:
> http://samza.apache.org/learn/documentation/0.9/container/state-management.html
> [1]:
> https://ci.apache.org/projects/flink/flink-docs-release-0.9/api/java/org/apache/flink/api/common/functions/RichFunction.html#open(org.apache.flink.configuration.Configuration)
>

Reply via email to