Hi Henry,

Flushing of buffered data in sinks should occur on two occasions - 1) when some 
buffer size limit is reached or a fixed-flush interval is fired, and 2) on 
checkpoints.

Flushing any pending data before completing a checkpoint ensures the sink has 
at-least-once guarantees, so that should answer your question about data loss.
For data delay due to the buffering, my only suggestion would be to have a 
time-interval based flushing configuration.
That is what is currently happening, for example, in the Kafka / Kinesis 
producer sinks. Records are buffered, and flushed at fixed intervals or when 
the buffer is full. They are also flushed on every checkpoint.

Cheers,
Gordon

On 13 November 2018 at 5:07:32 PM, 徐涛 (happydexu...@gmail.com) wrote:

Hi Experts,
When we implement a sink, usually we implement a batch, according to the record 
number or when reaching a time interval, however this may lead to data of last 
batch do not write to sink. Because it is triggered by the incoming record.
I also test the JDBCOutputFormat provided by flink, and found that it also has 
the same problem. If the batch size is 50, and 49 items arrive, but the last 
one comes in an hour later, then the 49 items will not be written to sink 
during the one hour. This may cause data delay or data loss.
So should any pose a solution to this problem?
Thanks a lot.

Best
Henry 

Reply via email to