And since you asked for a pointer, Ali: http://docs.confluent.io/current/streams/concepts.html#windowing
On Mon, Mar 20, 2017 at 5:43 PM, Michael Noll <mich...@confluent.io> wrote: > Late-arriving and out-of-order data is only treated specially for windowed > aggregations. > > For stateless operations such as `KStream#foreach()` or `KStream#map()`, > records are processed in the order they arrive (per partition). > > -Michael > > > > > On Sat, Mar 18, 2017 at 10:47 PM, Ali Akhtar <ali.rac...@gmail.com> wrote: > >> > later when message A arrives it will put that message back into >> > the right temporal context and publish an amended result for the proper >> > time/session window as if message B were consumed in the timestamp order >> > before message A. >> >> Does this apply to the aggregation Kafka stream methods then, and not to >> e.g foreach? >> >> On Sun, Mar 19, 2017 at 2:40 AM, Hans Jespersen <h...@confluent.io> >> wrote: >> >> > Yes stream processing and CEP are subtlety different things. >> > >> > Kafka Streams helps you write stateful apps and allows that state to be >> > preserved on disk (a local State store) as well as distributed for HA or >> > for parallel partitioned processing (via Kafka topic partitions and >> > consumer groups) as well as in memory (as a performance enhancement). >> > >> > However a classical CEP engine with a pre-modeled state machine and >> > pattern matching rules is something different from stream processing. >> > >> > It is on course possible to build a CEP system on top on Kafka Streams >> and >> > get the best of both worlds. >> > >> > -hans >> > >> > > On Mar 18, 2017, at 11:36 AM, Sabarish Sasidharan < >> > sabarish....@gmail.com> wrote: >> > > >> > > Hans >> > > >> > > What you state would work for aggregations, but not for state machines >> > and >> > > CEP. >> > > >> > > Regards >> > > Sab >> > > >> > >> On 19 Mar 2017 12:01 a.m., "Hans Jespersen" <h...@confluent.io> >> wrote: >> > >> >> > >> The only way to make sure A is consumed first would be to delay the >> > >> consumption of message B for at least 15 minutes which would fly in >> the >> > >> face of the principals of a true streaming platform so the short >> answer >> > to >> > >> your question is "no" because that would be batch processing not >> stream >> > >> processing. >> > >> >> > >> However, Kafka Streams does handle late arriving data. So if you had >> > some >> > >> analytics that computes results on a time window or a session window >> > then >> > >> Kafka streams will compute on the stream in real time (processing >> > message >> > >> B) and then later when message A arrives it will put that message >> back >> > into >> > >> the right temporal context and publish an amended result for the >> proper >> > >> time/session window as if message B were consumed in the timestamp >> order >> > >> before message A. The end result of this flow is that you eventually >> get >> > >> the same results you would get in a batch processing system but with >> the >> > >> added benefit of getting intermediary result at much lower latency. >> > >> >> > >> -hans >> > >> >> > >> /** >> > >> * Hans Jespersen, Principal Systems Engineer, Confluent Inc. >> > >> * h...@confluent.io (650)924-2670 >> > >> */ >> > >> >> > >>> On Sat, Mar 18, 2017 at 10:29 AM, Ali Akhtar <ali.rac...@gmail.com> >> > wrote: >> > >>> >> > >>> Is it possible to have Kafka Streams order messages correctly by >> their >> > >>> timestamps, even if they arrived out of order? >> > >>> >> > >>> E.g, say Message A with a timestamp of 5:00 PM and Message B with a >> > >>> timestamp of 5:15 PM, are sent. >> > >>> >> > >>> Message B arrives sooner than Message A, due to network issues. >> > >>> >> > >>> Is it possible to make sure that, across all consumers of Kafka >> Streams >> > >>> (even if they are across different servers, but have the same >> consumer >> > >>> group), Message A is consumed first, before Message B? >> > >>> >> > >>> Thanks. >> > >>> >> > >> >> > >> > > >