Hi Dominik, 

I had the same once with a custom processfunction. My processfunction buffered 
the data for a while and then output it again. As the proces function can do 
anything with the data (transforming, buffering, aggregating...), I think it's 
just not safe for flink to reason about the watermark of the output. 

I solved all my issues by calling `assignTimestampsAndWatermarks` directly post 
to the (co-)process function. 

Best regards 
Theo 


Von: "Dominik Wosiński" <wos...@gmail.com> 
An: "user" <user@flink.apache.org> 
Gesendet: Montag, 16. März 2020 16:55:18 
Betreff: Issues with Watermark generation after join 

Hey, 
I have noticed a weird behavior with a job that I am currently working on. I 
have 4 different streams from Kafka, lets call them A, B, C and D. Now the idea 
is that first I do SQL Join of A & B based on some field, then I create append 
stream from Joined A&B, let's call it E. Then I need to assign timestamps to E 
since it is a result of joining and Flink can't figure out the timestamps. 

Next, I union E & C, to create some F stream. Then finally I connect E & C 
using `keyBy` and CoProcessFunction. Now the issue I am facing is that if I try 
to, it works fine if I enforce the parallelism of E to be 1 by invoking 
setParallelism . But if parallelism is higher than 1, for the same data - the 
watermark is not progressing correctly. I can see that CoProcessFunction 
methods are invoked and that data is produced, but the Watermark is never 
progressing for this function. What I can see is that watermark is always equal 
to (0 - allowedOutOfOrderness). I can see that timestamps are correctly 
extracted and when I add debug prints I can actually see that Watermarks are 
generated for all streams, but for some reason, if the parallelism is > 1 they 
will never progress up to connect function. Is there anything that needs to be 
done after SQL joins that I don't know of ?? 

Best Regards, 
Dom. 

Reply via email to