Hi Xiaolong, It is disabled by default. Once you enable this feature: When reading your data, we will check your data header, if it is compressed data, we will parse this data by compression format, otherwise parse it by the original way. When updating your data, we will compress your data by the compression type.
We don't support rollback the data of the previous version Once you enable this feature. Thanks, Zixuan r...@apache.org <ranxiaolong...@gmail.com> 于2022年3月7日周一 16:16写道: > Hi Zixuan: > > Here I am more concerned about whether this feature will break backward > compatibility, for historical data or old clusters, how do we use this > feature. > > -- > Thanks > Xiaolong Ran > > Zixuan Liu <node...@gmail.com> 于2022年3月7日周一 15:14写道: > > > Hi everyone, > > > > Good catch! I update my proposal on > > https://github.com/apache/pulsar/issues/14529, and the compatibility > part > > has been appended: > > > > 1. The compression is disabled by default > > 2. We need to consider how to migrate the old data when this compression > > has been enabled. If the cursor data header is compressed format, we will > > parse the bytes data by compressed format, otherwise we will parse the > > cursor data directly by the original way > > > > Zixuan Liu <node...@gmail.com> 于2022年3月7日周一 15:11写道: > > > > > Hi PengHui, > > > > > > Sorry, the correct URL: https://github.com/apache/pulsar/issues/14529. > > > > > > :( Because of the problem of subscription, the email here is very > > > confusing. > > > > > > > > > PengHui Li <peng...@apache.org> 于2022年3月7日周一 12:39写道: > > > > > >> Hi Zixuan, > > >> > > >> Looks like you have added the wrong link for the proposal? > > >> https://github.com/apache/pulsar/issues/14395 is for PIP-44 > > >> > > >> Penghui > > >> > > >> On Mon, Mar 7, 2022 at 12:37 PM PengHui Li <peng...@apache.org> > wrote: > > >> > > >> > > This is a global setting now. But I wonder if we should compress > it > > >> only > > >> > if the size > > >> > is over a threshold? > > >> > > > >> > +1 > > >> > > > >> > Penghui > > >> > > > >> > On Sun, Mar 6, 2022 at 6:57 PM Enrico Olivelli <eolive...@gmail.com > > > > >> > wrote: > > >> > > > >> >> Il Dom 6 Mar 2022, 05:04 Haiting Jiang <jianghait...@apache.org> > ha > > >> >> scritto: > > >> >> > > >> >> > This is a global setting now. But I wonder if we should compress > it > > >> only > > >> >> > if the size > > >> >> > is over a threshold? > > >> >> > > >> >> > > >> >> Good idea > > >> >> > > >> >> Enrico > > >> >> > > >> >> > > >> >> Because: > > >> >> > 1. It's not easy for us to notice some managed cursor info is too > > >> large > > >> >> in > > >> >> > advance, normally it would be found only if it have actual > impact. > > >> But > > >> >> if > > >> >> > we enable this compression in advance, it will took some extra > > >> computing > > >> >> > resources. > > >> >> > 2. It seems that it won't be a common case that this managed > cursor > > >> info > > >> >> > is too large (only if there are a lot individualDeletedMessages > and > > >> >> > batchedEntryDeletionIndexInfo). So not quite necessary to > compress > > >> all > > >> >> > managed cursor info. > > >> >> > > > >> >> > Regards, > > >> >> > Haiting > > >> >> > > > >> >> > > > >> >> > On 2022/03/02 04:41:16 Zixuan Liu wrote: > > >> >> > > Hi Pulsar Community, > > >> >> > > > > >> >> > > > > >> >> > > I create a proposal that support ManagedCursorInfo compression. > > >> >> > > > > >> >> > > The proposal can be found: > > >> >> https://github.com/apache/pulsar/issues/14395 > > >> >> > > > > >> >> > > > > >> >> > > Motivation > > >> >> > > > > >> >> > > The cursor data is managed by ZooKeeper/etcd metadata store. > When > > >> >> > > cursor data becomes more and more, the data size will increase > > and > > >> >> > > will take a lot of time to pull the data. Therefore, it is > > >> necessary > > >> >> > > to add compression for the cursor, which can reduce the size of > > >> data > > >> >> > > and reduce the time of pulling data. > > >> >> > > Goal > > >> >> > > > > >> >> > > Support use the LZ4/ZLIB/ZSTD/SNAPPY to compress the > > >> >> ManagedCursorInfo. > > >> >> > > Implementation > > >> >> > > > > >> >> > > - Cursor compression format > > >> >> > > [MAGIC_NUMBER] + [METADATA_SIZE] + [METADATA_PAYLOAD] + > > >> >> > > [MANAGED_CURSOR_INFO_PAYLOAD] > > >> >> > > > > >> >> > > > > >> >> > > - > > >> >> > > > > >> >> > > MAGIC_NUMBER > > >> >> > > Ox4779 > > >> >> > > - > > >> >> > > > > >> >> > > METADATA > > >> >> > > Add a named ManagedCursorInfoMetadata message to > > >> >> MLDataFormats.proto: > > >> >> > > message ManagedCursorInfoMetadata { > > >> >> > > required CompressionType compressionType = 1; > > >> >> > > required int32 uncompressedSize = 2; > > >> >> > > } > > >> >> > > > > >> >> > > Currently, these compressions have been supported, we only need > > to > > >> >> > > deal with compression and decompression of the > ManagedCursorInfo > > >> data: > > >> >> > > > > >> >> > > - > > >> >> > > > > >> >> > > Get CursorInfo from the metadata store > > >> >> > > We will check the cursor data header, if it is compressed, > we > > >> will > > >> >> > > parse the bytes data by compressed format, otherwise by the > > >> original > > >> >> > > way. > > >> >> > > - > > >> >> > > > > >> >> > > Add/Update CursorInfo to the metadata store > > >> >> > > The default is to use compression if the compression type is > > >> >> > specified. > > >> >> > > > > >> >> > > > > >> >> > > Thanks, > > >> >> > > Zixuan > > >> >> > > > > >> >> > > > >> >> > > >> > > > >> > > > > > >