Re: Avoiding duplicates in joined stream

2017-08-25 Thread Aljoscha Krettek
Hi, The problem with reduplication in a streaming pipeline is that you need to keep all data that you ever saw or do the de-duplication only on a window. You can do the first by writing a keyed FlatMap operation that keeps state and only emits an incoming element if it hasn't been seen so far.

Avoiding duplicates in joined stream

2017-08-15 Thread Mohit Anchlia
What's the best way to avoid duplicates in joined stream. In below code I get duplicates of "A" because I have multiple of "A" in fileInput3. SingleOutputStreamOperator fileInput3 = streamEnv.fromElements("A", "A") .assignTimestampsAndWatermarks(timestampAndWatermarkAssigner1); fileInput1.join(f