jmckenzie-dev commented on code in PR #176:
URL: 
https://github.com/apache/cassandra-analytics/pull/176#discussion_r2892788908


##########
cassandra-four-zero-bridge/src/main/java/org/apache/cassandra/db/commitlog/BufferingCommitLogReader.java:
##########
@@ -309,6 +312,13 @@ private void readCommitLogSegment() throws IOException
                     break;
                 }
             }
+
+            // If the segment reader iterator completes reading commitlog with 
padded zeros, set the position
+            // as maxOffset to mark completion of reading commitlog

Review Comment:
   This comment confused me a bit and sent me down a rabbit hole. wdyt about 
this instead? Is it accurate?
   ```
   // If the loop finished naturally (iterator exhausted) without hitting an 
error or limit,
   // ensure the position reflects the end of the file. If we aborted early due 
to an error
   // or mutation limit, 'this.position' remains at the last valid read offset.
   ```



##########
cassandra-analytics-cdc/src/test/java/org/apache/cassandra/db/commitlog/BufferingCommitLogReaderTests.java:
##########
@@ -112,18 +94,76 @@ public void testReaderSeek()
             prevMarker = marker;
 
             if (marker.equals(allMarkers.get(allMarkers.size() - 1)))
-            {
-                // last marker should return 0 updates
-                // and be at the end of the file
                 assertThat(result).isEmpty();
-            }
             else
-            {
                 assertThat(result).isNotEmpty();
-            }
         }
     }
 
+    @Test
+    public void testPositionAtMaxOffsetInCommitlogsWithPaddedZeros()
+    {
+        CommitLog log = writeAndSync(100, null);
+        // Seek to maxOffset so the loop runs 0 iterations (natural 
exhaustion);
+        // Fix 1 sets position = startMarker.position, Fix 2 sets position = 
maxOffset on clean exit.
+        Marker maxOffsetMarker = log.markerAt(log.segmentId(), (int) 
log.maxOffset());
+        try (BufferingCommitLogReader reader = new 
BufferingCommitLogReader(log, maxOffsetMarker, CdcStats.STUB, null))
+        {
+            CommitLogReader.Result result = reader.result();
+            assertThat(reader.position()).isEqualTo((int) log.maxOffset());
+            assertThat(result.isFullyRead()).isTrue();
+        }
+    }
+
+    @Test
+    public void testPositionAtMaxOffsetWhenSeekingToEnd()

Review Comment:
   Excepting the `assertThat(result.updates()).isEmpty();` check, this test is 
identical to `testPositionAtMaxOffsetInCommitlogsWithPaddedZeros()`. Maybe we 
keep this one and just leave a javadoc breadcrumb to the JIRA ticket here and 
the 2 cases this test covers?



##########
cassandra-four-zero-bridge/src/main/java/org/apache/cassandra/db/commitlog/BufferingCommitLogReader.java:
##########
@@ -427,6 +437,7 @@ private void readSection(FileDataInput reader,
                 if (serializedSize == LEGACY_END_OF_SEGMENT_MARKER)
                 {
                     logger.trace("Encountered end of segment marker at", 
"position", reader.getFilePointer());
+                    this.position = (int) log.maxOffset();

Review Comment:
   Do we have a unit test that exercises this case of "hit 
`LEGACY_END_OF_SEGMENT_MARKER`?



##########
cassandra-four-zero-bridge/src/main/java/org/apache/cassandra/db/commitlog/BufferingCommitLogReader.java:
##########
@@ -284,6 +284,9 @@ private void readCommitLogSegment() throws IOException
             {
                 stats.commitLogBytesSkippedOnRead(startMarker.position() - 
reader.getFilePointer());
                 segmentReader.seek(startMarker.position());
+                // When starting from an offset, position must be initialized 
to startMarker.position()

Review Comment:
   Just leaving a note so we're aligned (no change requested). The relationship 
between reader, startMarker, and this.position is something we should consider 
structurally relating in the future. If there's an expectation that these 
numeric sentinels are linked, wrapping them in code that enforces those 
invariants would help prevent gaps and allow us to refactor more safely.
   
   Again - nothing to do here. ;) Just my first thoughts on looking at this 
code in detail. I think this is worth considering for a broader refactor; this 
BufferingCommitLogReader seems like a leaky abstraction on top of the regular 
commit log reader that's forcing us to do a lot of heavy lifting and exposing 
us to a lot of potential pain if we change it too much. 😬



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to