[ https://issues.apache.org/jira/browse/FLINK-28817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17578507#comment-17578507 ]
Thomas Weise commented on FLINK-28817: -------------------------------------- [~Benenson] thanks for investigating this issue. I think that has to do with the reader not having any restored splits (most likely because non were assigned) and therefore reporting -1 back to the enumerator. Let me check what the correct fix for this is. > NullPointerException in HybridSource when restoring from checkpoint > ------------------------------------------------------------------- > > Key: FLINK-28817 > URL: https://issues.apache.org/jira/browse/FLINK-28817 > Project: Flink > Issue Type: Bug > Components: Connectors / Common > Affects Versions: 1.14.4, 1.15.1 > Reporter: Michael > Priority: Major > Labels: pull-request-available > Attachments: Preconditions-checkNotNull-error.zip, > bf-29-JM-err-analysis.log > > > Scenario: > # CheckpointCoordinator - Completed checkpoint 14 for job > 00000000000000000000000000000000 > # HybridSource successfully completed processing a few SourceFactories, that > reads from s3 > # HybridSourceSplitEnumerator.switchEnumerator failed with > com.amazonaws.SdkClientException: Unable to execute HTTP request: Read timed > out. This is intermittent error, it is usually fixed, when Flink recover from > checkpoint & repeat the operation. > # Flink starts recovering from checkpoint, > # HybridSourceSplitEnumerator receives > SourceReaderFinishedEvent\{sourceIndex=-1} > # Processing this event cause > 2022/08/08 08:39:34.862 ERROR o.a.f.r.s.c.SourceCoordinator - Uncaught > exception in the SplitEnumerator for Source Source: hybrid-source while > handling operator event SourceEventWrapper[SourceReaderFinishedEvent > {sourceIndex=-1} > ] from subtask 6. Triggering job failover. > java.lang.NullPointerException: Source for index=0 is not available from > sources: \{788=org.apache.flink.connector.file.src.SppFileSource@5a3803f3} > at org.apache.flink.util.Preconditions.checkNotNull(Preconditions.java:104) > at > org.apache.flink.connector.base.source.hybridspp.SwitchedSources.sourceOf(SwitchedSources.java:36) > at > org.apache.flink.connector.base.source.hybridspp.HybridSourceSplitEnumerator.sendSwitchSourceEvent(HybridSourceSplitEnumerator.java:152) > at > org.apache.flink.connector.base.source.hybridspp.HybridSourceSplitEnumerator.handleSourceEvent(HybridSourceSplitEnumerator.java:226) > ... > I'm running my version of the Hybrid Sources with additional logging, so line > numbers & some names could be different from Flink Github. > My Observation: the problem is intermittent, sometimes it works ok, i.e. > SourceReaderFinishedEvent comes with correct sourceIndex. As I see from my > log, it happens if my SourceFactory.create() is executed BEFORE > HybridSourceSplitEnumerator - handleSourceEvent > SourceReaderFinishedEvent\{sourceIndex=-1}. > If HybridSourceSplitEnumerator - handleSourceEvent is executed before my > SourceFactory.create(), then sourceIndex=-1 in SourceReaderFinishedEvent > Preconditions-checkNotNull-error log from JobMgr is attached -- This message was sent by Atlassian Jira (v8.20.10#820010)