I don't want to speak for Apache Flink - I'm using it via Apache Beam
only - but generally speaking, each key will have to be held in state up
to some moment when it can be garbage collected. This moment is defined
(at least in the Apache Beam case) as the timestamp of end of window +
allowed l
Hey Jan,
No it isn't a logical constraint. Reason is there are different kind of
users, some who pay for live data while other want a cheaper version but
where the data is delayed.
But what happens if I add a random key ( lets say a uuid ) isn't that
bad for performance? Then for every Objec
Hi Dario,
out of curiosity, could you briefly describe the driving use-case? What
is the (logical) constraint, that drives the requirement? I'd guess,
that it could be related to waiting for some (external) condition? Or
maybe related to late data? I think that there might be better
approache
Hey Kiran,
Yeah was thinking of another solution, so I have one posgresql sink &
one kafka sink.
So I can just process the data in real time and insert them in the DB.
Then I would just select the latest row where created_at >= NOW() -
interval '15 minutes' and for any kafka consumer I would
Hi Dario,
Did you explore other options? If your use case (apart from delaying sink
writes) can be solved via spark streaming. Then maybe spark streaming with
a micro-batch of 15 mins would help.
On Sat, Jul 17, 2021 at 10:17 PM Dario Heinisch
wrote:
> Hey there,
>
> Hope all is well!
>
> I w
Hey there,
Hope all is well!
I would like to delay the time by 15minutes before my data arrives at my
sinks:
stream()
.map()
[]
.
.print()
I tried implementing my own ProcessFunction where TimeStamper is a
custom Interface:
public abstract class Timestamper {
public abstract long