Hi Xilang I think you are doing a together work with the offline team. Also what you said ETL, ETL team want to use the data in HDFS. I would like to confirm one question from you. What is their scheduling time for every job ? 5mins or 10 mins ?
> My user case is we read data from message queue, write to HDFS, and our ETL > team will use the data in HDFS. ETL need to know if all data is > ready to be read accurately I think you want to find a functionality that let the ETL team know when a bucket is ready for them to use. Correct ? If yes, please take a look on this jira : https://issues.apache.org/jira/browse/FLINK-9609 <https://issues.apache.org/jira/browse/FLINK-9609> Cheers Minglei > 在 2018年6月29日,上午9:03,XilangYan <xilang....@gmail.com> 写道: > > Hi Febian, > > Finally I have time to read the code, and it is brilliant it does provide > exactly once guarantee。 > However I still suggest to add the function that can close a file when > checkpoint made. I noticed that there is an enhancement > https://issues.apache.org/jira/browse/FLINK-9138 which can close file on a > time based rollover, but it is not very accurate. > My user case is we read data from message queue, write to HDFS, and our ETL > team will use the data in HDFS. In the case, ETL need to know if all data is > ready to be read accurately, so we use a counter to count how many data has > been wrote, if the counter is equal to the number we received, we think HDFS > file is ready. We send the counter message in a custom sink so ETL can know > how many data has been wrote, but if use current BucketingSink, even through > HDFS file is flushed, ETL may still cannot read the data. If we can close > file during checkpoint, then the result is accurately. And for the HDFS > small file problem, it can be controller by use bigger checkpoint interval. > > I did take the BuckingSink code and adapt our case, but if it can be done in > Flink, we can save to time to maintain our own branch. > > Thanks! > Jeffrey > > > > > -- > Sent from: > http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/