Re: Watermarks as "process completion" flags

Fabian Hueske Tue, 24 Nov 2015 12:54:07 -0800

Hi Anton,

If I got your requirements right, you are looking for a solution that
continuously produces updated partial aggregates in a streaming fashion.
When a  special event (no more trades) is received, you would like to store
the last update as a final result. Is that correct?


You can compute continuous updates using a reduce() or fold() function.
These will produce a new update for each incoming event.
For example:

val s: DataStream[(Int, Long)] = ...
s.keyBy(_._1)
  .reduce( (x,y) => (x._1, y._2 + y._2) )

would continuously compute a sum for every key (_._1) and produce an update
for each incoming record.

You could add a flag to the record and implement a ReduceFunction that
marks a record as final when the no-more-trades event is received.
With a filter and a data sink you could emit such final records to a
persistent data store.

Btw.: You can also define custom trigger policies for windows. A custom
trigger is called for each element that is added to a window and when
certain timers expire. For example with a custom trigger, you can evaluate
a window for every second element that is added. You can also define
whether the elements in the window should be retained or removed after the
evaluation.

Best, Fabian



2015-11-24 21:32 GMT+01:00 Anton Polyakov <polyakov.an...@gmail.com>:

> Hi Max
>
> thanks for reply. From what I understand window works in a way that it
> buffers records while window is open, then apply transformation once window
> close is triggered and pass transformed result.
> In my case then window will be open for few hours, then the whole amount
> of trades will be processed once window close is triggered. Actually I want
> to process events as they are produced without buffering them. It is more
> like a stream with some special mark versus windowing seems more like a
> batch (if I understand it correctly).
>
> In other words - buffering and waiting for window to close, then
> processing will be equal to simply doing one-off processing when all events
> are produced. I am looking for a solution when I am processing events as
> they are produced and when source signals "done" my processing is also
> nearly done.
>
>
> On Tue, Nov 24, 2015 at 2:41 PM, Maximilian Michels <m...@apache.org>
> wrote:
>
>> Hi Anton,
>>
>> You should be able to model your problem using the Flink Streaming
>> API. The actions you want to perform on the streamed records
>> correspond to transformations on Windows. You can indeed use
>> Watermarks to signal the window that a threshold for an action has
>> been reached. Otherwise an eviction policy should also do it.
>>
>> Without more details about what you want to do I can only refer you to
>> the streaming API documentation:
>> Please see
>> https://ci.apache.org/projects/flink/flink-docs-release-0.10/apis/streaming_guide.html
>>
>> Thanks,
>> Max
>>
>> On Sun, Nov 22, 2015 at 8:53 PM, Anton Polyakov
>> <polyakov.an...@gmail.com> wrote:
>> > Hi
>> >
>> > I am very new to Flink and in fact never used it. My task (which I
>> currently solve using home grown Redis-based solution) is quite simple - I
>> have a system which produces some events (trades, it is a financial system)
>> and computational chain which computes some measure accumulatively over
>> these events. Those events form a long but finite stream, they are produced
>> as a result of end of day flow. Computational logic forms a processing DAG
>> which computes some measure over these events (VaR). Each trade is
>> processed through DAG and at different stages might produce different set
>> of subsequent events (like return vectors), eventually they all arrive into
>> some aggregator which computes accumulated measure (reducer).
>> >
>> > Ideally I would like to process trades as they appear (i.e. stream
>> them) and once producer reaches end of portfolio (there will be no more
>> trades), I need to write final resulting measure and mark it as “end of day
>> record”. Of course I also could use a classical batch - i.e. wait until all
>> trades are produced and then batch process them, but this will be too
>> inefficient.
>> >
>> > If I use Flink, I will need a sort of watermark saying - “done, no more
>> trades” and once this watermark reaches end of DAG, final measure can be
>> saved. More generally would be cool to have an indication at the end of DAG
>> telling to which input stream position current measure corresponds.
>> >
>> > I feel my problem is very typical yet I can’t find any solution. All
>> examples operate either on infinite streams where nobody cares about
>> completion or classical batch examples which rely on fact all input data is
>> ready.
>> >
>> > Can you please hint me.
>> >
>> > Thank you vm
>> > Anton
>>
>
>

Re: Watermarks as "process completion" flags

Reply via email to