On 2024/09/25 21:59:53 Heesung Sohn wrote: > Hi team, > Can we use a space-efficient probabilistic data structure, such as > bloom filter to store the acked msg ids, if re-delivering acked msgs > are not so strict?
Talking about algorithms, there are also efficient ways optimized for storing integers losslessly. FastPFor (https://github.com/lemire/FastPFor) is a C++ implementation in this category. There's also a Java port: https://github.com/lemire/JavaFastPFOR. A good collection of research papers on this topic can be found here: https://github.com/lemire/FastPFor?tab=readme-ov-file#reference-and-documentation. For bitmaps, we already use the RoaringBitmap library (https://github.com/RoaringBitmap/RoaringBitmap). RoaringBitmap also has a portable storage format definition: https://github.com/RoaringBitmap/RoaringFormatSpec. There are also research papers specifically about bitmaps: https://github.com/RoaringBitmap/RoaringBitmap?tab=readme-ov-file#scientific-documentation. I'd hope to see these used for storing individual acks in Pulsar, but it seems to be too late to handle that when there are already other implementations that are sufficient for solving the problem. -Lari