Hi Anton, If I got your requirements right, you are looking for a solution that continuously produces updated partial aggregates in a streaming fashion. When a special event (no more trades) is received, you would like to store the last update as a final result. Is that correct?
You can compute continuous updates using a reduce() or fold() function. These will produce a new update for each incoming event. For example: val s: DataStream[(Int, Long)] = ... s.keyBy(_._1) .reduce( (x,y) => (x._1, y._2 + y._2) ) would continuously compute a sum for every key (_._1) and produce an update for each incoming record. You could add a flag to the record and implement a ReduceFunction that marks a record as final when the no-more-trades event is received. With a filter and a data sink you could emit such final records to a persistent data store. Btw.: You can also define custom trigger policies for windows. A custom trigger is called for each element that is added to a window and when certain timers expire. For example with a custom trigger, you can evaluate a window for every second element that is added. You can also define whether the elements in the window should be retained or removed after the evaluation. Best, Fabian 2015-11-24 21:32 GMT+01:00 Anton Polyakov <polyakov.an...@gmail.com>: > Hi Max > > thanks for reply. From what I understand window works in a way that it > buffers records while window is open, then apply transformation once window > close is triggered and pass transformed result. > In my case then window will be open for few hours, then the whole amount > of trades will be processed once window close is triggered. Actually I want > to process events as they are produced without buffering them. It is more > like a stream with some special mark versus windowing seems more like a > batch (if I understand it correctly). > > In other words - buffering and waiting for window to close, then > processing will be equal to simply doing one-off processing when all events > are produced. I am looking for a solution when I am processing events as > they are produced and when source signals "done" my processing is also > nearly done. > > > On Tue, Nov 24, 2015 at 2:41 PM, Maximilian Michels <m...@apache.org> > wrote: > >> Hi Anton, >> >> You should be able to model your problem using the Flink Streaming >> API. The actions you want to perform on the streamed records >> correspond to transformations on Windows. You can indeed use >> Watermarks to signal the window that a threshold for an action has >> been reached. Otherwise an eviction policy should also do it. >> >> Without more details about what you want to do I can only refer you to >> the streaming API documentation: >> Please see >> https://ci.apache.org/projects/flink/flink-docs-release-0.10/apis/streaming_guide.html >> >> Thanks, >> Max >> >> On Sun, Nov 22, 2015 at 8:53 PM, Anton Polyakov >> <polyakov.an...@gmail.com> wrote: >> > Hi >> > >> > I am very new to Flink and in fact never used it. My task (which I >> currently solve using home grown Redis-based solution) is quite simple - I >> have a system which produces some events (trades, it is a financial system) >> and computational chain which computes some measure accumulatively over >> these events. Those events form a long but finite stream, they are produced >> as a result of end of day flow. Computational logic forms a processing DAG >> which computes some measure over these events (VaR). Each trade is >> processed through DAG and at different stages might produce different set >> of subsequent events (like return vectors), eventually they all arrive into >> some aggregator which computes accumulated measure (reducer). >> > >> > Ideally I would like to process trades as they appear (i.e. stream >> them) and once producer reaches end of portfolio (there will be no more >> trades), I need to write final resulting measure and mark it as “end of day >> record”. Of course I also could use a classical batch - i.e. wait until all >> trades are produced and then batch process them, but this will be too >> inefficient. >> > >> > If I use Flink, I will need a sort of watermark saying - “done, no more >> trades” and once this watermark reaches end of DAG, final measure can be >> saved. More generally would be cool to have an indication at the end of DAG >> telling to which input stream position current measure corresponds. >> > >> > I feel my problem is very typical yet I can’t find any solution. All >> examples operate either on infinite streams where nobody cares about >> completion or classical batch examples which rely on fact all input data is >> ready. >> > >> > Can you please hint me. >> > >> > Thank you vm >> > Anton >> > >