Sören Henning created KAFKA-9647:
------------------------------------

             Summary: Add ability to  suppress until window end (not close)
                 Key: KAFKA-9647
                 URL: https://issues.apache.org/jira/browse/KAFKA-9647
             Project: Kafka
          Issue Type: Wish
          Components: streams
            Reporter: Sören Henning


*Preface:* This feature request originates from a [recently asked question on 
Stack 
Overflow|https://stackoverflow.com/questions/60005630/kafka-streams-suppress-until-window-end-not-close],
 for which Matthias J. Sax suggested to create a feature request.

*Feature Request:* In addition to suppressing updates to a windowed KTable 
until a window closes, we suggest to only suppress "early" results. By early 
results we mean results computed before the window ends, but not those results 
occurring during the grace period. Thus, this suppress option would suppress 
all aggregation results with timestamp < window end, but forward all records 
with timestamp >= window end and timestamp < window close.

*Use Case:* For an exemplary use case, we refer to John Roesler's [blog post on 
the initial introduction of the suppress 
operator|https://www.confluent.io/blog/kafka-streams-take-on-watermarks-and-triggers/].
 The post argues that for the case of altering not every intermediate 
aggregation result should trigger an alert message, but only the "final" 
result. Otherwise, a "follow-up email telling people to ignore the first 
message" might become required if the final results would not cause an alert 
but intermediate results would. Kafka Streams' current solution for this use 
case would be to use a suppress operation, which would only forward the final 
result, which would be the last result before no further updates could occur. 
This is when the grace period of a window passed (the window closes).

However, ideally we would like to set the grace period a large as possible to 
allow for very late-arriving messages, which in turn would lead to very late 
alerts. On the other hand, such late-arriving messages are rare in practice and 
normally the order of events corresponds largely to the order of messages. 
Thus, a reasonable option would be to suppress aggregation results only until 
the window ends (i.e. stream time > window end) and then forward this "most 
likely final" result. For the use case of altering, this means an alert is 
triggered when we are relatively certain that recorded data requires an alert. 
Then, only the "seldom" case of late updates which would change our decision 
would require the "follow-up email telling people to ignore the first message". 
Such rare "correction" should be acceptable for many use cases.

*Further extension:* In addition to suppressing all updates until the window 
ends and afterwards forwarding all updates, a further extension would be to 
only forward late records every x seconds. Maybe the existing 
`Suppressed.untilTimeLimit( .. )` could be reused for this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to