Martijn Visser created FLINK-40016:
--------------------------------------
Summary:
UnalignedCheckpointRescaleITCase.shouldRescaleUnalignedCheckpoint fails with
"Corrupt stream, found tag" (in-flight stream desync) on rescale recovery
Key: FLINK-40016
URL: https://issues.apache.org/jira/browse/FLINK-40016
Project: Flink
Issue Type: Bug
Components: Runtime / Checkpointing
Reporter: Martijn Visser
[https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=76170&view=results]
(leg: test_cron_azure tests)
{code:java}
org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
at
org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:180)
at
org.apache.flink.test.checkpointing.UnalignedCheckpointTestBase.execute(UnalignedCheckpointTestBase.java:217)
at
org.apache.flink.test.checkpointing.UnalignedCheckpointRescaleITCase.shouldRescaleUnalignedCheckpoint(UnalignedCheckpointRescaleITCase.java:683)
Caused by: org.apache.flink.runtime.JobException: Recovery is suppressed by
FixedDelayRestartBackoffTimeStrategy(maxNumberRestartAttempts=1,
backoffTimeMS=100)
Caused by: java.io.IOException: Can't get next record for channel
InputChannelInfo{gateIdx=0, inputChannelIdx=5}
Caused by: java.io.IOException: Corrupt stream, found tag: 8
{code}
{{org.apache.flink.test.checkpointing.UnalignedCheckpointRescaleITCase.shouldRescaleUnalignedCheckpoint}}
(parameter {{{}[19]{}}}) fails on rescale recovery: the restored in-flight
byte stream is misaligned, so {{StreamElementSerializer}} reads a data byte
where an element tag is expected. Valid tags are 0-6; "tag 8" is out of range,
confirming a stream desync rather than a format mismatch.
Suspected to be in the recovered in-flight-buffer / filtering-recovery path
that landed in master 2026-02 .. 2026-04 (FLINK-39018 buffer migration from
RecoveredInputChannel to physical channels; FLINK-38930 filtering record before
processing without spilling / heap-buffer fallback during recovery; FLINK-39519
reusable heap segment for pre-filter source buffers). The failing build's head
commit (4902753429, FLINK-35562) is a flink-table-planner test-only change and
cannot have introduced this, so the defect is latent in master.
Related: FLINK-22197 (same test/method but a {{{}NoSuchFileException{}}},
different cause), FLINK-38643 (hang), FLINK-35351 / FLINK-39162 (closed
UC-corruption, custom-partitioner specific).
--
This message was sent by Atlassian Jira
(v8.20.10#820010)