Re: Snappy Compression for Checkpoints

Prasanna kumar Mon, 14 Apr 2025 08:29:14 -0700

Hi all,

Is Snappy Compression Enabled for broadcast state ?


https://issues.apache.org/jira/browse/FLINK-30113 and FLINK-30112
<https://issues.apache.org/jira/browse/FLINK-30112> are marked as closed.

Documentation link
https://nightlies.apache.org/flink/flink-docs-release-1.20/docs/ops/state/large_state_tuning/#compression
 says that only
"Compression works on the granularity of key-groups in keyed state, i.e.
each key-group can be decompressed individually, which is important for
rescaling."

And there is no mention of broadcast state.

Thanks,
Prasanna.

On Mon, Jan 9, 2023 at 2:31 PM Martijn Visser <martijnvis...@apache.org>
wrote:

> Hi Prasanna,
>
> There is no support for compression in operator state. This can be tracked
> under https://issues.apache.org/jira/browse/FLINK-30113
>
> Best regards,
>
> Martijn
>
> On Fri, Jan 6, 2023 at 7:53 AM Prasanna kumar <
> prasannakumarram...@gmail.com> wrote:
>
>> Hello Flink Community ,
>>
>>
>>
>> We are running Jobs in flink version 1.12.7 which reads from Kafka ,
>> apply some rules(stored in broadcast state) and then writes to kafka. This
>> is a very low latency and high throughput and we have set up at least one
>> semantics.
>>
>>
>>
>> Checkpoint Configuration Used
>>
>>    1. We cannot have many duplicates during the restarts so we have set
>>    a checkpoint interval of 3s. (We cannot increase it any more since , we
>>    have 10s of 1000s of records processed per sec ) .
>>    2. Checkpointing target location is AWS S3.
>>    3. Max Concurrent Checkpoint is 1
>>    4. Time Between Checkpoints is 500ms
>>
>> Earlier we had around 10 rule objects stored in broadcast state. Recently
>> we have enabled 80 rule objects.  Post increase , we are seeing a lot of
>> checkpoints in progress . (Earlier we had rarely seen this in metrics
>> dashboard).  The Parallelism of BroadCast Function is around 10 and the
>> present Checkpoint size is 64kb.
>>
>>
>>
>> Since we expect this rule objects to increase to 1000 and beyond in a
>> year's time, we are looking at ways to improve performance in checkpoint.
>> We cannot use incremental checkpoint since its supported only in RocksDB
>> and the development arc is little higher. Looking at easier solution first
>> , we tried using "SnapshotCompression" , but we did not see any difference
>> in decrease of checkpoint size.
>>
>>
>>
>> Have few questions on the same
>>
>>    1. Does SnapshotCompression work in version 1.12.7 ?
>>    2. If Yes , how much size reduction could we expect if this is
>>    enabled and at what size does the Compression works . Is there any
>>    threshold post only which the compression would work ?
>>
>>
>>
>> Apart from the questions above , you are welcome to suggest any config
>> changes that can be done for improvements.
>>
>>
>>
>> Thanks & Regards,
>>
>> Prasanna
>>
>

Re: Snappy Compression for Checkpoints

Reply via email to