Hi Soumitra,

It's a good improvement, In general I'm +1 on this. Several questions /
suggestions:

1. Could you please share the benchmark results in brief? I have seen your
perf's README but I could only find a throughput of 1,000 rec/s v.s. 500
rec/s comparison. I would like to know the state size and the usage of cpu
or I/O across different setups. It would be even better if we could
determine the overhead of the JNI. I'm also thinking that we may provide
some built-in merge operators for common primitives like sum of long
written in C on frocksdb side, to save the JNI overhead during compaction.

2. I noticed that you are using the DataStream API for testing. So the
built-in windows cannot benefit from this improvement due to the
retractions or late messages?

3. I can see there are some changes to the flink's public API in your
branch, right? I thought that no public API changes would be required. We
need to maintain API signature compatibility. If the changes are truly
necessary, it is required to file a FLIP under [1] and collect further
comments from the mailing list. Once the FLIP has been approved through a
vote, we can proceed.


[1]
https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals

Best,
Zakelly

On Thu, Apr 16, 2026 at 12:38 PM Soumitra Kumar <[email protected]>
wrote:

> Hello Community,
>
> I wanted to share some work I have been doing on the frocksdb and Flink
> that I think is useful for the Flink community.
>
> I have implemented support for Java base associative merge operators in
> https://github.com/ververica/frocksdb and use that to support additional
> reducing and aggregating state variables in Flink. I have used this do
> event reordering in a flink app. All the code is in my github repo (
> https://github.com/soumitrak) and I will be more than happy to work with
> the members to contribute the code back to frocksdb and Flink.
>
> I have filed a followup task (
> https://issues.apache.org/jira/browse/FLINK-39456) to leverage the support
> in frocksdb to expose the state variables in the Flink.
>
> Code in my forked repos:
> https://github.com/soumitrak/frocksdb/commits/FRocksDB-8.10.0-SK/ -
> Created
> a branch off FRocksDB-8.10.0 and committed the changes
> https://github.com/soumitrak/flink/commits/rocksdb-merge-operator/ -
> Created a branch off master and committed the changes
> https://github.com/soumitrak/flink_streaming_event_reordering - Test
> project used to build, test, perf test, and compare the performance of
> heap-based, ValueState, and new MergeState (using associative merge
> operator).
>
> I should have started two email threads, but they are related, so added the
> details in one.
>
> Looking forward to guidance on how to go about this.
> Best, -Soumitra.
>

Reply via email to