ming li created FLINK-31008: ------------------------------- Summary: [Flink][Table Store] The Split allocation of the same bucket in ContinuousFileSplitEnumerator may be out of order Key: FLINK-31008 URL: https://issues.apache.org/jira/browse/FLINK-31008 Project: Flink Issue Type: Bug Components: Table Store Reporter: ming li
There are two places in {{ContinuousFileSplitEnumerator}} that add {{FileStoreSourceSplit}} to {{{}bucketSplits{}}}: {{addSplitsBack}} and {{{}processDiscoveredSplits{}}}. {{processDiscoveredSplits}} will continuously check for new splits and add them to the queue. At this time, the order of the splits is in order. {code:java} private void addSplits(Collection<FileStoreSourceSplit> splits) { splits.forEach(this::addSplit); } private void addSplit(FileStoreSourceSplit split) { bucketSplits .computeIfAbsent(((DataSplit) split.split()).bucket(), i -> new LinkedList<>()) .add(split); }{code} However, when the task failover, the splits that have been allocated before will be returned. At this time, these returned splits are also added to the end of the queue, which leads to disorder in the allocation of splits. I think these returned splits should be added to the head of the queue to ensure the order of allocation. -- This message was sent by Atlassian Jira (v8.20.10#820010)