jmckenzie-dev commented on code in PR #176:
URL:
https://github.com/apache/cassandra-analytics/pull/176#discussion_r2892788908
##########
cassandra-four-zero-bridge/src/main/java/org/apache/cassandra/db/commitlog/BufferingCommitLogReader.java:
##########
@@ -309,6 +312,13 @@ private void readCommitLogSegment() throws IOException
break;
}
}
+
+ // If the segment reader iterator completes reading commitlog with
padded zeros, set the position
+ // as maxOffset to mark completion of reading commitlog
Review Comment:
This comment confused me a bit and sent me down a rabbit hole. wdyt about
this instead? Is it accurate?
```
// If the loop finished naturally (iterator exhausted) without hitting an
error or limit,
// ensure the position reflects the end of the file. If we aborted early due
to an error
// or mutation limit, 'this.position' remains at the last valid read offset.
```
##########
cassandra-analytics-cdc/src/test/java/org/apache/cassandra/db/commitlog/BufferingCommitLogReaderTests.java:
##########
@@ -112,18 +94,76 @@ public void testReaderSeek()
prevMarker = marker;
if (marker.equals(allMarkers.get(allMarkers.size() - 1)))
- {
- // last marker should return 0 updates
- // and be at the end of the file
assertThat(result).isEmpty();
- }
else
- {
assertThat(result).isNotEmpty();
- }
}
}
+ @Test
+ public void testPositionAtMaxOffsetInCommitlogsWithPaddedZeros()
+ {
+ CommitLog log = writeAndSync(100, null);
+ // Seek to maxOffset so the loop runs 0 iterations (natural
exhaustion);
+ // Fix 1 sets position = startMarker.position, Fix 2 sets position =
maxOffset on clean exit.
+ Marker maxOffsetMarker = log.markerAt(log.segmentId(), (int)
log.maxOffset());
+ try (BufferingCommitLogReader reader = new
BufferingCommitLogReader(log, maxOffsetMarker, CdcStats.STUB, null))
+ {
+ CommitLogReader.Result result = reader.result();
+ assertThat(reader.position()).isEqualTo((int) log.maxOffset());
+ assertThat(result.isFullyRead()).isTrue();
+ }
+ }
+
+ @Test
+ public void testPositionAtMaxOffsetWhenSeekingToEnd()
Review Comment:
Excepting the `assertThat(result.updates()).isEmpty();` check, this test is
identical to `testPositionAtMaxOffsetInCommitlogsWithPaddedZeros()`. Maybe we
keep this one and just leave a javadoc breadcrumb to the JIRA ticket here and
the 2 cases this test covers?
##########
cassandra-four-zero-bridge/src/main/java/org/apache/cassandra/db/commitlog/BufferingCommitLogReader.java:
##########
@@ -427,6 +437,7 @@ private void readSection(FileDataInput reader,
if (serializedSize == LEGACY_END_OF_SEGMENT_MARKER)
{
logger.trace("Encountered end of segment marker at",
"position", reader.getFilePointer());
+ this.position = (int) log.maxOffset();
Review Comment:
Do we have a unit test that exercises this case of "hit
`LEGACY_END_OF_SEGMENT_MARKER`?
##########
cassandra-four-zero-bridge/src/main/java/org/apache/cassandra/db/commitlog/BufferingCommitLogReader.java:
##########
@@ -284,6 +284,9 @@ private void readCommitLogSegment() throws IOException
{
stats.commitLogBytesSkippedOnRead(startMarker.position() -
reader.getFilePointer());
segmentReader.seek(startMarker.position());
+ // When starting from an offset, position must be initialized
to startMarker.position()
Review Comment:
Just leaving a note so we're aligned (no change requested). The relationship
between reader, startMarker, and this.position is something we should consider
structurally relating in the future. If there's an expectation that these
numeric sentinels are linked, wrapping them in code that enforces those
invariants would help prevent gaps and allow us to refactor more safely.
Again - nothing to do here. ;) Just my first thoughts on looking at this
code in detail. I think this is worth considering for a broader refactor; this
BufferingCommitLogReader seems like a leaky abstraction on top of the regular
commit log reader that's forcing us to do a lot of heavy lifting and exposing
us to a lot of potential pain if we change it too much. 😬
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]