Hi all,

We're interested in being able to use a FileSource
<https://nightlies.apache.org/flink/flink-docs-release-1.14/api/java/org/apache/flink/connector/file/src/FileSource.html>
read from a Google Cloud Storage (GCS) archive of messages from a Kafka
topic, roughly in order.

Our GCS archive is partitioned into folders by time, however, when we read
it using a FileSource, the messages are processed in a random order. We'd
like to be able to control what order the files are read in, and take
advantage of the clear ordering our GCS archive provides.

What is the best way to achieve this? Would it be possible to write a
custom FileEnumerator
<https://nightlies.apache.org/flink/flink-docs-release-1.14/api/java/org/apache/flink/connector/file/src/enumerate/FileEnumerator.html>
that
sorts the directories and returns the splits in order?

Any help would be greatly appreciated!

Thanks,
Kevin

Reply via email to