Re: [PR] KAFKA-806: Index may not always observe log.index.interval.bytes [kafka]

via GitHub Wed, 08 Jan 2025 07:52:44 -0800


FrankYang0529 commented on code in PR #18012:
URL: https://github.com/apache/kafka/pull/18012#discussion_r1907410734



##########
storage/src/main/java/org/apache/kafka/storage/internals/log/LogSegment.java:
##########
@@ -257,13 +257,21 @@ public void append(long largestOffset,
             if (largestTimestampMs > maxTimestampSoFar()) {
                 maxTimestampAndOffsetSoFar = new 
TimestampOffset(largestTimestampMs, shallowOffsetOfMaxTimestamp);
             }
-            // append an entry to the index (if needed)
+            // append an entry to the timestamp index at MemoryRecords level 
(if needed)
             if (bytesSinceLastIndexEntry > indexIntervalBytes) {
-                offsetIndex().append(largestOffset, physicalPosition);
                 timeIndex().maybeAppend(maxTimestampSoFar(), 
shallowOffsetOfMaxTimestampSoFar());
-                bytesSinceLastIndexEntry = 0;
             }
-            bytesSinceLastIndexEntry += records.sizeInBytes();
+
+            // append an entry to the offset index at batches level (if needed)
+            for (RecordBatch batch : records.batches()) {
+                if (bytesSinceLastIndexEntry > indexIntervalBytes &&
+                    batch.lastOffset() >= offsetIndex().lastOffset()) {
+                    offsetIndex().append(batch.lastOffset(), physicalPosition);

Review Comment:
   Hi @junrao, thanks for review. I addressed both comments.
   
   For timestamp, it's not always monotonic in records, so checking offset by 
timestamp index is not as much as offset index. Probably, we can consider 
whether it's worth to add timestamp for each batch, because this operation 
introduces more cost.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] KAFKA-806: Index may not always observe log.index.interval.bytes [kafka]

Reply via email to