Hi Hector, The main reasons for deprecating the readFileStream() was that: 1) it was only capable of parsing Strings and in a rather limited way as one could not even specify the encoding 2) it was not fault-tolerant, so your concerns about exactly-once were not covered
One concern that I can find about keeping the last read index for every file that we have seen so far, is that this would simply blow up the memory. Two things I would like to also mention are that: 1) the method has been deprecated a long time ago. 2) there is a new FileSource coming with 1.12 that may be interesting for you [1]. Cheers, Kostas [1] https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-files/src/main/java/org/apache/flink/connector/file/src/FileSource.java On Tue, Nov 17, 2020 at 4:30 AM Hector He <hecto...@qq.com> wrote: > > May I have a ask about deprecating readFileStream(...), is there a > alternative to this method? Source code lead me to use readFile instead, but > it does not perform as readFileStream, readFileStream can reads file content > incrementally, but readFile with FileProcessingMode.PROCESS_CONTINUOUSLY > argument reads all file conent every time when the content changes. So why > will Flink make readFileStream to be deprecated but without a better > alternative? > > From the description of official document below link, > FileProcessingMode.PROCESS_CONTINUOUSLY will break the “exactly-once” > semantics. > > https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/datastream_api.html#data-sources > > > > -- > Sent from: > http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/