This is the loop - code reference
<https://github.com/apache/flink/blob/master/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/functions/source/ContinuousFileReaderOperator.java#L346>,
where it fetches all records from the split, and then only the
MailboxProcessor gets control to check other mail. This loop was introduced
here
<https://github.com/apache/flink/commit/1a69cb9fce629b0c458f5ea514d9ac8de008687f>
.




On Tue, Dec 5, 2023 at 9:00 PM Darin Amos <darin.a...@instacart.com.invalid>
wrote:

> I thought for sure this was already the existing behavior with this
> operator. Does it not check the mailbox executor after every record read?
>
> On Tue, Dec 5, 2023 at 6:48 AM Prabhu Joseph (Jira) <j...@apache.org>
> wrote:
>
> > Prabhu Joseph created FLINK-33753:
> > -------------------------------------
> >
> >              Summary: ContinuousFileReaderOperator consume records as
> mini
> > batch
> >                  Key: FLINK-33753
> >                  URL: https://issues.apache.org/jira/browse/FLINK-33753
> >              Project: Flink
> >           Issue Type: Improvement
> >     Affects Versions: 1.18.0
> >             Reporter: Prabhu Joseph
> >
> >
> > The ContinuousFileReaderOperator reads and collects the records from a
> > split in a loop. If the split size is large, then the loop will take more
> > time, and then the mailbox executor won't have a chance to process the
> > checkpoint barrier. This leads to checkpoint timing out.
> > ContinuousFileReaderOperator could be improved to consume the records in
> a
> > mini batch, similar to Hudi's StreamReadOperator (
> > https://issues.apache.org/jira/browse/HUDI-2485).
> >
> >
> >
> > --
> > This message was sent by Atlassian Jira
> > (v8.20.10#820010)
> >
>

Reply via email to