Hi,
The problem with reduplication in a streaming pipeline is that you need to keep
all data that you ever saw or do the de-duplication only on a window. You can
do the first by writing a keyed FlatMap operation that keeps state and only
emits an incoming element if it hasn't been seen so far.
What's the best way to avoid duplicates in joined stream. In below code I
get duplicates of "A" because I have multiple of "A" in fileInput3.
SingleOutputStreamOperator fileInput3 = streamEnv.fromElements("A",
"A")
.assignTimestampsAndWatermarks(timestampAndWatermarkAssigner1);
fileInput1.join(f