Matthias J. Sax created KAFKA-7934:
--------------------------------------

             Summary: Optimize restore for windowed and session stores
                 Key: KAFKA-7934
                 URL: https://issues.apache.org/jira/browse/KAFKA-7934
             Project: Kafka
          Issue Type: Improvement
          Components: streams
            Reporter: Matthias J. Sax


During state restore of window/session stores, the changelog topic is scanned 
from the oldest entries to the newest entry. This happen on a record-per-record 
basis or in record batches.

During this process, new segments are created while time advances (base on the 
record timestamp of the record that are restored). However, depending on the 
retention time, we might expire segments during restore process later again. 
This is wasteful. Because retention time is based on the largest timestamp per 
partition, it is possible to compute a bound for live and expired segment 
upfront (assuming that we know the largest timestamp). This way, during 
restore, we could avoid creating segments that are expired later anyway and 
skip over all corresponding records.

The problem is, that we don't know the largest timestamp per partition. Maybe 
the broker timestamp index could help to provide an approximation for this 
value.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to