Re: How to monitor changes in the existing files using flink 1.17.2

Yu Chen Tue, 09 Jan 2024 21:23:05 -0800

Hi Nitin,

In the Flink file system connector, a collection of file paths is used by flink 
to identify whether a file has been processed or not in the state[1]. 
So, if your file path has not been updated but the content has been updated, it 
will not be reprocessed in that case.


Meanwhile, here is another question, should we reprocess all the content in a 
file when it has been updated?
If your answer is Yes, then the easiest way is to have upstream write the data 
to a new file.

If your answer is No, then I think you need a data source that can support 
incremental reading, perhaps consider Apache Paimon's similar datalake format 
alternative to pure files to provide you with more ability to manipulate the 
data.

[1] 
https://github.com/apache/flink/blob/cad090aaed770c90facb6edbcce57dd341449a02/flink-connectors/flink-connector-files/src/main/java/org/apache/flink/connector/file/src/impl/ContinuousFileSplitEnumerator.java#L62C33-L62C54
[2] Overview | Apache Paimon 
<https://paimon.apache.org/docs/master/concepts/overview/>

Best,
Yu Chen


> 2024年1月10日 04:31，Nitin Saini <nitinsaini1...@gmail.com> 写道：
> 
> Hi Flink Community,
> 
> I was using flink 1.12.7 readFile to read files from the s3 it was able to 
> monitor if there are new files added or updation in the existing files as 
> well.
> 
> But now I have migrated to flink 1.17.2 and using FileSource to read files 
> from s3 it was able to monitor if new files are being added to s3 but not 
> able to monitor changes in the existing files.
> 
> Is there any way in flink 1.17.2 through which we can achieve that 
> functionality as well, i.e. we are also able to monitor the changes in the 
> existing files as well. By overriding some classes or by doing something else.
> 
> Thanks & Regards,
> Nitin Saini

Re: How to monitor changes in the existing files using flink 1.17.2

Reply via email to