[ 
https://issues.apache.org/jira/browse/KAFKA-7224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17097025#comment-17097025
 ] 

Maatari edited comment on KAFKA-7224 at 5/1/20, 12:34 AM:
----------------------------------------------------------

Thank you so much for your clarification it helps a lot. Will try to clarify 
some of my confusing statement. 
{quote}What do you mean by "at the end of the topology"? There is nothing like 
this. Note that the input is not a "finite table" but the "infinite table 
changelog stream".
{quote}
 
 I just meant having something like this
{code:java}
ktable0.join(ktable1.groupby.reduce).supress(...){code}
It is my language here that was misleading. I agree with you, it is not a 
finite table. What i want is to significantly mitigate the intermediary 
results. 
{quote}That is by design. Because the input may contain out-of-order data, time 
cannot easily be advanced if the input stream "stalls". Otherwise, the whole 
operation becomes non-deterministic (what might be ok for your use case 
though). This would require some wall-clock time emit strategy though (as you 
mentioned already, ie, KP-424).
{quote}
 
 As you suggest above it is exactly what would put me in the right direction, 
given my use case. I will specifically adopt your language *wall-clock time 
emit strategy.* Is that really what was intended in KIP-424. In that page 
[https://cwiki.apache.org/confluence/display/KAFKA/KIP-424%3A+Allow+suppression+of+intermediate+events+based+on+wall+clock+time]
 The author specifically says: _"However, checks against the wall clock are 
event driven: if no events are received in over 5 seconds, no events will be 
sent downstream"_
 Hence, just to clarify, to you mean the same thing when you say wall-clock 
time emit strategy ? Because if that is the case, the same problem as above 
will happen, some records can still stay stuck if nothing else comes in. It is 
important, because I wanted to ask from your point of view if it is even 
feasible, to have wall-clock time used as i mean. That is, if the time of a 
key, passed the configure time, even if no new record have been ingested, emit 
the record anyway. 

 


was (Author: maatdeamon):
Thank you so much for your clarification it helps a lot. Will try to clarify 
some of my confusing statement. 
{quote}What do you mean by "at the end of the topology"? There is nothing like 
this. Note that the input is not a "finite table" but the "infinite table 
changelog stream".
{quote}
 
I just meant having something like this
{code:java}
ktable0.join(ktable1.groupby.reduce).supress(...){code}
It is my language here that was misleading. I agree with you, it is not a 
finite table. What i want is to significantly mitigate the intermediary 
results. 


{quote}That is by design. Because the input may contain out-of-order data, time 
cannot easily be advanced if the input stream "stalls". Otherwise, the whole 
operation becomes non-deterministic (what might be ok for your use case 
though). This would require some wall-clock time emit strategy though (as you 
mentioned already, ie, KP-424).{quote}
 
As you suggest above it is exactly what would put me in the right direction, 
given my use case. I will specifically adopt your language *wall-clock time 
emit strategy.* Is that really what was intended in KIP-424. In that page 
[https://cwiki.apache.org/confluence/display/KAFKA/KIP-424%3A+Allow+suppression+of+intermediate+events+based+on+wall+clock+time]
 The author specifically says: _"However, checks against the wall clock are 
event driven: if no events are received in over 5 seconds, no events will be 
sent downstream"_
Hence, just to clarify, to you mean the same thing when you say wall-clock time 
emit strategy ? Because if that is the case, the same problem as above will, 
me, some records can still stay stuck if nothing else comes in. It is 
important, because i was to ask from your point of you if it is even feasible, 
to have wall-clock time used as i mean. That is, if the time of a key, passed 
the configure time, even if no new record have been ingested, emit the record 
anyway. 



 

> KIP-328: Add spill-to-disk for Suppression
> ------------------------------------------
>
>                 Key: KAFKA-7224
>                 URL: https://issues.apache.org/jira/browse/KAFKA-7224
>             Project: Kafka
>          Issue Type: Improvement
>          Components: streams
>            Reporter: John Roesler
>            Priority: Major
>
> As described in 
> [https://cwiki.apache.org/confluence/display/KAFKA/KIP-328%3A+Ability+to+suppress+updates+for+KTables]
> Following on KAFKA-7223, implement the spill-to-disk buffering strategy.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to