Hi Zakelly,

Thanks for sharing all these resources, I took some time to go through them and 
wanted to share a quick summary and a few thoughts.

Here’s a brief recap of what was discussed on the mailing list (more details in 
[1]):

They proposed introducing a BinarySortedMultiMapState<UK, UV> with two 
user-facing APIs:
- BinarySortedMultiMapState<UK, UV> for storing multiple values per key (add(K 
key, V value) to append a value, etc.)
- BinarySortedMapState<UK, UV> as a simpler option when there’s only one value 
per key (a special case of the first one)

The readRange() method would use an eager-batch mode by default (fetching 
entries in chunks of 128), which strikes a good balance between lazy access 
(one by one) and fully eager reads.

Main challenge: ensuring that the binary representation preserves the same 
ordering as the Java Comparable.
So keys would need a custom serializer::
interface LexicographicalTypeSerializer<T> extends TypeSerializer<T> {
  Optional<Comparator<T>> findComparator(); // useful for HeapStateBackend
}
This allows users to define custom key types while preserving sort order.

For the HeapStateBackend, a skip list was suggested instead of a TreeMap, as it 
offers better memory locality and fewer allocations, while still maintaining 
O(log n) performance.

Note 1: From what I understand (I haven’t looked into Flink’s internals yet), 
the proposed encoding should be compatible with ForSt as well.

Note 2: For the order book use case, sorting doubles (IEEE 754) is fairly 
straightforward:

long toOrderedBits(double value) {
long bits = Double.doubleToRawLongBits(value);
return bits ^ (((bits >> 63) - 1L) | 0x8000000000000000L);
}


[1] https://cwiki.apache.org/confluence/x/Xo_FD

Thanks gain,

Charles

From: Zakelly Lan <zakelly....@gmail.com>
Date: Monday, April 14, 2025 at 5:55 AM
To: dev@flink.apache.org <dev@flink.apache.org>
Subject: Re: [DISCUSS] Missing TreeMapState abstraction for maintaining ordered 
state

Hi Charles,

There is a FLIP-220[1] and discussions[2] about introducing a sorted map
state in binary order. But it seems like there is no further progress and
conclusion. It would be nice if anyone could drive this.

I would +1 for introducing this. As the RocksDB stores key-value in binary
order, we should also keep that order. And the API design requires more
discussion.


[1] https://cwiki.apache.org/confluence/x/Xo_FD
[2] https://lists.apache.org/thread/lbdpr1cjt9ddrotowmzzk4lv787b3t5z

Best,
Zakelly

On Sat, Apr 12, 2025 at 6:35 PM Charles COLELLA <charles...@hotmail.fr>
wrote:

> Hi all,
>
> I'm building a Flink application that processes real-time order book data
> (~70k events/sec), partitioned by instrument.
>
>
> For each key, I need to maintain an order book where levels are kept
> ordered by price. Since Flink's state API doesn't support ordered
> structures like a NavigableMap or TreeMap, I currently have to:
>
> - Store all levels in a MapState<Double, OrderLevel> for persistence,
>
> - Maintain a parallel TreeMap<Double, OrderLevel> in memory to preserve
> ordering,
>
> - Keep both in sync on every update,
>
> - And rebuild the memory structure from MapState during recovery.
>
>
> This adds significant overhead:
>
> - Extra memory (duplicate data structures),
>
> - Extra logic (synchronization and consistency),
>
> - Extra CPU cost (sorting, filtering, top-K extraction).
>
> The use case (maintaining sorted state per key) is common in financial
> applications, ranking systems, and any stream logic requiring efficient
> ordered access.
>
>
> I’ve tried to search the Jira, dev mailing list, and flips, but couldn’t
> find a discussion on 'TreeMapState', 'SortedMapState`, or abstraction that
> would support this.
>
>
> Has this been discussed before, or would it make sense to open a proposal?
>
>
> Thanks,
>
> Charles
>

Reply via email to