[ 
https://issues.apache.org/jira/browse/KAFKA-10847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17290685#comment-17290685
 ] 

Guozhang Wang commented on KAFKA-10847:
---------------------------------------

I agree that we should delete upon emitting expired records' joined results. 
Currently since we do range-query + deletion per input record, I guess in 
practice each time we would only expire very few records.

If range query + deletion turns out to be an overhead in practice, we can 
consider 1) do range-query + deletion less frequently so that each time we 
would get a reasonable number of records to expire, and 2) use range deletion 
(https://rocksdb.org/blog/2018/11/21/delete-range.html), which would be 
efficient especially if we have more records to expire in one call.

bq. I do a single-lookup in the store to check if the key is there, if not, 
then it continues; otherwise it calls the put(key, null) to delete it.

Just a syntax sugar, you can just call `putIfAbsent(key, null)` instead.

> Avoid spurious left/outer join results in stream-stream join 
> -------------------------------------------------------------
>
>                 Key: KAFKA-10847
>                 URL: https://issues.apache.org/jira/browse/KAFKA-10847
>             Project: Kafka
>          Issue Type: Improvement
>          Components: streams
>            Reporter: Matthias J. Sax
>            Assignee: Sergio Peña
>            Priority: Major
>
> KafkaStreams follows an eager execution model, ie, it never buffers input 
> records but processes them right away. For left/outer stream-stream join, 
> this implies that left/outer join result might be emitted before the window 
> end (or window close) time is reached. Thus, a record what will be an 
> inner-join result, might produce a eager (and spurious) left/outer join 
> result.
> We should change the implementation of the join, to not emit eager left/outer 
> join result, but instead delay the emission of such result after the window 
> grace period passed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to