Late-arriving and out-of-order data is only treated specially for windowed aggregations.
For stateless operations such as `KStream#foreach()` or `KStream#map()`, records are processed in the order they arrive (per partition). -Michael On Sat, Mar 18, 2017 at 10:47 PM, Ali Akhtar <ali.rac...@gmail.com> wrote: > > later when message A arrives it will put that message back into > > the right temporal context and publish an amended result for the proper > > time/session window as if message B were consumed in the timestamp order > > before message A. > > Does this apply to the aggregation Kafka stream methods then, and not to > e.g foreach? > > On Sun, Mar 19, 2017 at 2:40 AM, Hans Jespersen <h...@confluent.io> wrote: > > > Yes stream processing and CEP are subtlety different things. > > > > Kafka Streams helps you write stateful apps and allows that state to be > > preserved on disk (a local State store) as well as distributed for HA or > > for parallel partitioned processing (via Kafka topic partitions and > > consumer groups) as well as in memory (as a performance enhancement). > > > > However a classical CEP engine with a pre-modeled state machine and > > pattern matching rules is something different from stream processing. > > > > It is on course possible to build a CEP system on top on Kafka Streams > and > > get the best of both worlds. > > > > -hans > > > > > On Mar 18, 2017, at 11:36 AM, Sabarish Sasidharan < > > sabarish....@gmail.com> wrote: > > > > > > Hans > > > > > > What you state would work for aggregations, but not for state machines > > and > > > CEP. > > > > > > Regards > > > Sab > > > > > >> On 19 Mar 2017 12:01 a.m., "Hans Jespersen" <h...@confluent.io> > wrote: > > >> > > >> The only way to make sure A is consumed first would be to delay the > > >> consumption of message B for at least 15 minutes which would fly in > the > > >> face of the principals of a true streaming platform so the short > answer > > to > > >> your question is "no" because that would be batch processing not > stream > > >> processing. > > >> > > >> However, Kafka Streams does handle late arriving data. So if you had > > some > > >> analytics that computes results on a time window or a session window > > then > > >> Kafka streams will compute on the stream in real time (processing > > message > > >> B) and then later when message A arrives it will put that message back > > into > > >> the right temporal context and publish an amended result for the > proper > > >> time/session window as if message B were consumed in the timestamp > order > > >> before message A. The end result of this flow is that you eventually > get > > >> the same results you would get in a batch processing system but with > the > > >> added benefit of getting intermediary result at much lower latency. > > >> > > >> -hans > > >> > > >> /** > > >> * Hans Jespersen, Principal Systems Engineer, Confluent Inc. > > >> * h...@confluent.io (650)924-2670 > > >> */ > > >> > > >>> On Sat, Mar 18, 2017 at 10:29 AM, Ali Akhtar <ali.rac...@gmail.com> > > wrote: > > >>> > > >>> Is it possible to have Kafka Streams order messages correctly by > their > > >>> timestamps, even if they arrived out of order? > > >>> > > >>> E.g, say Message A with a timestamp of 5:00 PM and Message B with a > > >>> timestamp of 5:15 PM, are sent. > > >>> > > >>> Message B arrives sooner than Message A, due to network issues. > > >>> > > >>> Is it possible to make sure that, across all consumers of Kafka > Streams > > >>> (even if they are across different servers, but have the same > consumer > > >>> group), Message A is consumed first, before Message B? > > >>> > > >>> Thanks. > > >>> > > >> > > >