John Roesler created KAFKA-8770:
-----------------------------------

             Summary: Either switch to or add an option for emit-on-change
                 Key: KAFKA-8770
                 URL: https://issues.apache.org/jira/browse/KAFKA-8770
             Project: Kafka
          Issue Type: Improvement
          Components: streams
            Reporter: John Roesler


Currently, Streams offers two emission models:
* emit-on-window-close: (using Suppression)
* emit-on-update: (i.e., emit a new result whenever a new record is processed, 
regardless of whether the result has changed)

There is also an option to drop some intermediate results, either using caching 
or suppression.

However, there is no support for emit-on-change, in which results would be 
forwarded only if the result has changed. This has been reported to be 
extremely valuable as a performance optimizations for some high-traffic 
applications, and it reduces the computational burden both internally for 
downstream Streams operations, as well as for external systems that consume the 
results, and currently have to deal with a lot of "no-op" changes.

It would be pretty straightforward to implement this, by loading the prior 
results before a stateful operation and comparing with the new result before 
persisting or forwarding. In many cases, we load the prior result anyway, so it 
may not be a significant performance impact either.

One design challenge is what to do with timestamps. If we get one record at 
time 1 that produces a result, and then another at time 2 that produces a 
no-op, what should be the timestamp of the result, 1 or 2? emit-on-change would 
require us to say 1.

Clearly, we'd need to do some serious benchmarks to evaluate any potential 
implementation of emit-on-change.

Another design challenge is to decide if we should just automatically provide 
emit-on-change for stateful operators, or if it should be configurable. 
Configuration increases complexity, so unless the performance impact is high, 
we may just want to change the emission model without a configuration.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Reply via email to