On 2024/09/25 21:59:53 Heesung Sohn wrote:
> Hi team,
> Can we use a space-efficient probabilistic data structure, such as
> bloom filter to store the acked msg ids, if re-delivering acked msgs
> are not so strict?

Talking about algorithms, there are also efficient ways optimized for storing 
integers losslessly. 
FastPFor (https://github.com/lemire/FastPFor) is a C++ implementation in this 
category. There's also a Java port: https://github.com/lemire/JavaFastPFOR.
A good collection of research papers on this topic can be found here:
https://github.com/lemire/FastPFor?tab=readme-ov-file#reference-and-documentation.

For bitmaps, we already use the RoaringBitmap library 
(https://github.com/RoaringBitmap/RoaringBitmap).
RoaringBitmap also has a portable storage format definition: 
https://github.com/RoaringBitmap/RoaringFormatSpec.
There are also research papers specifically about bitmaps:
https://github.com/RoaringBitmap/RoaringBitmap?tab=readme-ov-file#scientific-documentation.

I'd hope to see these used for storing individual acks in Pulsar, but it seems 
to be too late to handle that when there are already other implementations that 
are sufficient for solving the problem.

-Lari

Reply via email to