Hi Xilang

I think you are doing a together work with the offline team. Also what you said 
ETL, ETL team want to use the data in HDFS. I would like to confirm one 
question from you. What is their scheduling time for every job ? 5mins or 10 
mins ? 

> My user case is we read data from message queue, write to HDFS, and our ETL
> team will use the data in HDFS. ETL need to know if all data is
> ready to be read accurately

I think you want to find a functionality that let the ETL team know when a 
bucket is ready for them to use. Correct ? If yes, please take a look on this 
jira : https://issues.apache.org/jira/browse/FLINK-9609 
<https://issues.apache.org/jira/browse/FLINK-9609>

Cheers
Minglei


> 在 2018年6月29日,上午9:03,XilangYan <xilang....@gmail.com> 写道:
> 
> Hi Febian,
> 
> Finally I have time to read the code, and it is brilliant it does provide
> exactly once guarantee。
> However I still suggest to add the function that can close a file when
> checkpoint made. I noticed that there is an enhancement
> https://issues.apache.org/jira/browse/FLINK-9138 which can close file on a
> time based rollover, but it is not very accurate.
> My user case is we read data from message queue, write to HDFS, and our ETL
> team will use the data in HDFS. In the case, ETL need to know if all data is
> ready to be read accurately, so we use a counter to count how many data has
> been wrote, if the counter is equal to the number we received, we think HDFS
> file is ready. We send the counter message in a custom sink so ETL can know
> how many data has been wrote, but if use current BucketingSink, even through
> HDFS file is flushed, ETL may still cannot read the data. If we can close
> file during checkpoint, then the result is accurately. And for the HDFS
> small file problem, it can be controller by use bigger checkpoint interval.
> 
> I did take the BuckingSink code and adapt our case, but if it can be done in
> Flink, we can save to time to maintain our own branch.
> 
> Thanks!
> Jeffrey
> 
> 
> 
> 
> --
> Sent from: 
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Reply via email to