Just transform the list in a DataStream. A datastream can be finite.

One solution, in the context of a Streaming environment is to use Kafka, or any 
other distributed broker, although Flink ships with a KafkaSource.

 

1)Create a Kafka Topic dedicated to your list of key/values. Inject your values 
into this topic, partitionned by the keys. So that you recover the keys in 
Flink.

 

2) Create a source for the stream of tuple your analysing -> output1 (Tuples).

 

3) Create a KafkaSource, and parse/recover your key value pairs from this 
source (e.g a first map operator) : map1 -> output 2 (K,V), then :

 

 

 

                 a)  If you need all key/Value pairs at each operator :  
broadcast all partitions from the output 1 to the analysis operator

 

                  b) if you dont need all key/values pairs, just chain output1 
to the analysis operator. Partitioning of K,V pairs will depend on Kafka 
partitioning strategy, and can be controlled in Flink      anyway.

 

4) The analysis operator :  will perform a RichCoFlatMapFunction, and can be 
Checkpointed.

When receiving K,V pairs from output2, store them in a local state.

When receiving tuple, should be able to to filter with the help of the local 
state, and propagate downstream or not.

 

 

 

 

 

 

 

 

 

 

 

> Message du 30/05/16 13:41
> De : leon_mcl...@tutanota.com
> A : "User" 
> Copie à : 
> Objet : Elegantly sharing state in a streaming environment
> 
>Hello Flink team,

How can i partition and share static state among instances of a streaming 
operator? 

I have a huge list of keys and values, which are used to filter tuples in a 
stream. The list does not change. Currently i am sharing the list with each 
operator instance via the constructor, although only a subset of the list is 
required per operator (the assignment of subset to operator instance is known). 
I cannot use DataSet based functions in a streaming execution environment to 
assign sub lists. I also cannot use DataStream based partitioning functions as 
the list is static, i.e. not a DataStream. The dilemma exists as i am mixing 
static (DataSet type) content with streaming content. Is there any other 
approach aside from using an additional tool (e.g. distributed cache)?

Thanks in advance.

Regards
Leon



Reply via email to