Hi Krzysztof, If I understand right, I think managed operator state might not help here since currently Flink only support in-memory operator state.
Is it possible currently we first have a customized SplitEnumerator to skip the processed files in some other way? For example, if these files have different created time, we may process them in time order, and only maintains the latest file created time and the list of processed files with the same time. Best, Yun ------------------Original Mail ------------------ Sender:Krzysztof Chmielewski <krzysiek.chmielew...@gmail.com> Send Date:Thu Dec 23 06:33:07 2021 Recipients:user <user@flink.apache.org> Subject:Operator state in New Source API Hi, Is it possible to use managed operator state like MapState in an implementation of new unified source interface [1]. I'm especially interested with using Managed State in SplitEnumerator implementation. I have a use case that is a variation of File Source where I will have a great number of files that I need to process, for example a million. I know that FileSource maintains a collection of already processed paths in ContinuousFileSplitEnumerator object. In my case I cannot afford to have all million Strings sitting on my heap. I'm hoping to use an operator state for this and build splits in batches, periodically adding new files to the alreadyProcessedPaths collection. Regards, Krzysztof Chmielewski [1] https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/sources/