Hi,
Still stuck around this.
My understanding is, this is something Flink can't handle. If the batch-size of
Kafka Producer is non zero(which ideally should be), there will be in-memory
records and data loss(boundary cases). Only way I can handle this with Flink is
my checkpointing interval, which flushes any buffered records.
Is my understanding correct here? Or am I still missing something?
thanks,
Chirag
On Monday, 12 March, 2018, 12:59:51 PM IST, Chirag Dewan
<chirag.dewa...@yahoo.in> wrote:
Hi,
I am trying to use Kafka Sink 0.11 with ATLEAST_ONCE semantic and experiencing
some data loss on Task Manager failure.
Its a simple job with parallelism=1 and a single Task Manager. After a few
checkpoints(kafka flush's) i kill one of my Task Manager running as a container
on Docker Swarm.
I observe a small number of records, usually 4-5, being lost on Kafka broker(1
broker cluster, 1 topic with 1 partition).
My FlinkKafkaProducer config are as follows :
batch.size=default(16384)retries=3max.in.flight.requests.per.connection=1acks=1
As I understand it, all the messages batched by
KafkaProducer(RecordAccumulator) in the memory-buffer, are lost. Is this why I
cant see my records on the broker? Or is there something I am doing terribly
wrong? Any help appreciated.
TIA,
Chirag