Hi Pulsar Community,

I create a proposal that ManagedCursorInfo compression. The proposal can be 
found: https://github.com/apache/pulsar/issues/14529 
<https://github.com/apache/pulsar/issues/14529>

Thanks,
Zixuan

------------------

Motivation

The cursor data is managed by ZooKeeper/etcd metadata store. When cursor data 
becomes more and more, the data size will increase and will take a lot of time 
to pull the data. Therefore, it is necessary to add compression for the cursor, 
which can reduce the size of data and reduce the time of pulling data.

Goal

Support use the LZ4/ZLIB/ZSTD/SNAPPY to compress the ManagedCursorInfo.

Implementation

CursorInfo compression format

[MAGIC_NUMBER] + [METADATA_SIZE] + [METADATA_PAYLOAD] + 
[MANAGED_CURSOR_INFO_PAYLOAD]

MAGIC_NUMBER: Ox4779

METADATA
Add a named ManagedCursorInfoMetadata message to MLDataFormats.proto

message ManagedCursorInfoMetadata {
    required CompressionType compressionType = 1;
    required int32 uncompressedSize = 2;
}
CursorInfo compression and decompression design

Currently, these compressions types have been defined and implemented by 
Pulsar, we only need to deal with compression and decompression of the 
ManagedCursorInfo data:

Get CursorInfo from the metadata store
We will check the cursor data header, if it is compressed, we will parse the 
bytes data by compressed format, otherwise by the original way.

Add/Update CursorInfo to the metadata store
The default is to use compression if the compression type is specified.

CursorInfo compression type configuration

Add managedCursorInfoCompressionType in 
org.apache.pulsar.broker.ServiceConfiguration and 
org.apache.bookkeeper.mledger.ManagedLedgerFactoryConfig.


Reply via email to