Re: Delay data elements in pipeline by X minutes

2021-07-19 Thread Jan Lukavský
I don't want to speak for Apache Flink - I'm using it via Apache Beam only - but generally speaking, each key will have to be held in state up to some moment when it can be garbage collected. This moment is defined (at least in the Apache Beam case) as the timestamp of end of window + allowed l

Re: Delay data elements in pipeline by X minutes

2021-07-19 Thread Dario Heinisch
Hey Jan, No it isn't a logical constraint. Reason is there are different kind of users, some who pay for live data while other want a cheaper version but where the data is delayed. But what happens if I add a random key ( lets say a uuid ) isn't that bad for performance? Then for every Objec

Re: Delay data elements in pipeline by X minutes

2021-07-18 Thread Jan Lukavský
Hi Dario, out of curiosity, could you briefly describe the driving use-case? What is the (logical) constraint, that drives the requirement? I'd guess, that it could be related to waiting for some (external) condition? Or maybe related to late data? I think that there might be better approache

Re: Delay data elements in pipeline by X minutes

2021-07-18 Thread Dario Heinisch
Hey Kiran, Yeah was thinking of another solution, so I have one posgresql sink & one kafka sink. So I can just process the data in real time and insert them in the DB. Then I would just select the latest row where created_at >= NOW() - interval '15 minutes' and for any kafka consumer I would

Re: Delay data elements in pipeline by X minutes

2021-07-18 Thread Kiran Japannavar
Hi Dario, Did you explore other options? If your use case (apart from delaying sink writes) can be solved via spark streaming. Then maybe spark streaming with a micro-batch of 15 mins would help. On Sat, Jul 17, 2021 at 10:17 PM Dario Heinisch wrote: > Hey there, > > Hope all is well! > > I w

Delay data elements in pipeline by X minutes

2021-07-17 Thread Dario Heinisch
Hey there, Hope all is well! I would like to delay the time by 15minutes before my data arrives at my sinks: stream() .map() [] . .print() I tried implementing my own ProcessFunction where TimeStamper is a custom Interface: public abstract class Timestamper {     public abstract long