Hello Flink Community ,


We are running Jobs in flink version 1.12.7 which reads from Kafka , apply
some rules(stored in broadcast state) and then writes to kafka. This is a
very low latency and high throughput and we have set up at least one
semantics.



Checkpoint Configuration Used

   1. We cannot have many duplicates during the restarts so we have set a
   checkpoint interval of 3s. (We cannot increase it any more since , we have
   10s of 1000s of records processed per sec ) .
   2. Checkpointing target location is AWS S3.
   3. Max Concurrent Checkpoint is 1
   4. Time Between Checkpoints is 500ms

Earlier we had around 10 rule objects stored in broadcast state. Recently
we have enabled 80 rule objects.  Post increase , we are seeing a lot of
checkpoints in progress . (Earlier we had rarely seen this in metrics
dashboard).  The Parallelism of BroadCast Function is around 10 and the
present Checkpoint size is 64kb.



Since we expect this rule objects to increase to 1000 and beyond in a
year's time, we are looking at ways to improve performance in checkpoint.
We cannot use incremental checkpoint since its supported only in RocksDB
and the development arc is little higher. Looking at easier solution first
, we tried using "SnapshotCompression" , but we did not see any difference
in decrease of checkpoint size.



Have few questions on the same

   1. Does SnapshotCompression work in version 1.12.7 ?
   2. If Yes , how much size reduction could we expect if this is enabled
   and at what size does the Compression works . Is there any threshold post
   only which the compression would work ?



Apart from the questions above , you are welcome to suggest any config
changes that can be done for improvements.



Thanks & Regards,

Prasanna

Reply via email to