Thanks for reporting back and the debugging advice!
Best, Fabian
2018-06-08 9:00 GMT+02:00 Juho Autio :
> Flink was NOT at fault. Turns out our Kafka producer had OS level clock
> sync problems :(
>
> Because of that, our Kafka occasionally had some messages in between with
> an incorrect timest
Flink was NOT at fault. Turns out our Kafka producer had OS level clock
sync problems :(
Because of that, our Kafka occasionally had some messages in between with
an incorrect timestamp. In practice they were about 7 days older than they
should.
I'm really sorry for wasting your time on this. But
Thanks for correcting me Piotr. I didn't look close enough at the code.
With the presently implemented logic, a record should not be emitted to a
side output if its window wasn't closed yet.
2018-05-11 14:13 GMT+02:00 Piotr Nowojski :
> Generally speaking best practise is always to simplify your
Generally speaking best practise is always to simplify your program as much as
possible to narrow down the scope of the search. Replace data source with
statically generated events, remove unnecessary components Etc. Either such
process help you figure out what’s wrong on your own and if not, if
Thanks for that code snippet, I should try it out to simulate my DAG.. If
any suggestions how to debug futher what's causing late data on a
production stream job, please let me know.
On Fri, May 11, 2018 at 2:18 PM, Piotr Nowojski
wrote:
> Hey,
>
> Actually I think Fabian initial message was inc
Hey,
Actually I think Fabian initial message was incorrect. As far as I can see in
the code of WindowOperator (last lines of
org.apache.flink.streaming.runtime.operators.windowing.WindowOperator#processElement
), the element is sent to late side output if it is late AND it wasn’t
assigned to a
Thanks. What I still don't get is why my message got filtered in the first
place. Even if the allowed lateness filtering would be done "on the
window", data should not be dropped as late if it's not in fact late by
more than the allowedLateness setting.
Assuming that these conditions hold:
- messa
Hi Juho,
Thanks for bringing up this topic! I share your intuition.
IMO, records should only be filtered out and send to a side output if any
of the windows they would be assigned to is closed already.
I had a look into the code and found that records are filtered out as late
based on the followi
I don't understand why I'm getting some data discarded as late on my Flink
stream job a long time before the window even closes.
I can not be 100% sure, but to me it seems like the kafka consumer is
basically causing the data to be dropped as "late", not the window. I
didn't expect this to ever ha