Re: [VOTE] PIP-330: getMessagesById gets all messages
+1 (non-binding) On Tue, Jan 16, 2024 at 4:43 AM Dezhi Liu wrote: > +1 (non-binding) > > Thanks, > Dezhi Liu > > On 2024/01/15 09:33:48 Zixuan Liu wrote: > > Hi Pulsar Community, > > > > Voting for PIP-330: getMessagesById gets all messages > > > > PIP: https://github.com/apache/pulsar/pull/21873 > > Discussion thread: > > https://lists.apache.org/thread/vqyh3mvtvovd383sd8zxnlzsspdr863z > > > > Thanks, > > Zixuan > > >
Re: [VOTE] PIP-330: getMessagesById gets all messages
+1 nonbinding > 2024年1月17日 17:35,Asaf Mesika 写道: > > +1 (non-binding) > > On Tue, Jan 16, 2024 at 4:43 AM Dezhi Liu wrote: > >> +1 (non-binding) >> >> Thanks, >> Dezhi Liu >> >> On 2024/01/15 09:33:48 Zixuan Liu wrote: >>> Hi Pulsar Community, >>> >>> Voting for PIP-330: getMessagesById gets all messages >>> >>> PIP: https://github.com/apache/pulsar/pull/21873 >>> Discussion thread: >>> https://lists.apache.org/thread/vqyh3mvtvovd383sd8zxnlzsspdr863z >>> >>> Thanks, >>> Zixuan >>> >>
[VOTE] Release Apache Pulsar Helm Chart 3.2.0 based on 3.2.0-candidate-1
Hello Apache Pulsar Community, This is a call for the vote to release the Apache Pulsar Helm Chart version 3.2.0. Release notes for 3.2.0-candidate-1: https://github.com/apache/pulsar-helm-chart/releases/tag/pulsar-3.2.0-candidate-1 The release candidate is available at: https://dist.apache.org/repos/dist/dev/pulsar/helm-chart/3.2.0-candidate-1/ pulsar-chart-3.2.0-source.tar.gz - is the "main source release". pulsar-3.2.0.tgz - is the binary Helm Chart release. Public keys are available at: https://www.apache.org/dist/pulsar/KEYS For convenience "index.yaml" has been uploaded (though excluded from voting), so you can also run the below commands. helm repo add --force-update apache-pulsar-dist-dev https://dist.apache.org/repos/dist/dev/pulsar/helm-chart/3.2.0-candidate-1/ helm repo update helm install pulsar apache-pulsar-dist-dev/pulsar --version 3.2.0 --set affinity.anti_affinity=false pulsar-3.2.0.tgz.prov - is also uploaded for verifying Chart Integrity, though it is not strictly required for releasing the artifact based on ASF Guidelines. You can optionally verify this file using this helm plugin https://github.com/technosophos/helm-gpg, or by using helm --verify (https://helm.sh/docs/helm/helm_verify/). helm fetch --prov apache-pulsar-dist-dev/pulsar helm plugin install https://github.com/technosophos/helm-gpg helm gpg verify pulsar-3.2.0.tgz The vote will be open for at least 72 hours. Only votes from PMC members are binding, but members of the community are encouraged to test the release and vote with "(non-binding)". For license checks, the .rat-excludes files is included, so you can run the following to verify licenses (just update ): tar -xvf pulsar-chart-3.2.0-source.tar.gz cd pulsar-chart-3.2.0 java -jar /apache-rat-0.15/apache-rat-0.15.jar . -E .rat-excludes Please note that the version number excludes the `-candidate-X` string, so it's now simply 3.2.0. This will allow us to rename the artifact without modifying the artifact checksums when we actually release it. Thanks, Lari
Re: [DISCUSS] Always set a broker side timestamp for message and deprecate some API
Only enabled `AppendBrokerTimestampMetadataInterceptor` is not enough. We should also improve the TTL: When client publish time > Ledger create time + Ledger rollover time, and `brokerPublishTime` is not set, we can make Ledger TTL time = Ledger create time + Ledger rollover time. This change can make sure entry expired when client clock is not right. On 2024/01/17 04:31:16 PengHui Li wrote: > > User disabling `AppendBrokerTimestampMetadataInterceptor` does not mean > that they allow this bug. > This is a configuration, not an API. It is difficult to use documentation > to regulate user behavior. > > Actually, they need to know why they want to disable ` > AppendBrokerTimestampMetadataInterceptor`. > Otherwise, why do they want to disable `AppendBrokerTimestampMetadata > Interceptor`? > > Pulsar's default configuration tries to provide a best practice for most of > the cases. To avoid the potential > problem, Pulsar enables `AppendBrokerTimestampMetadataInterceptor` by > default. But it doesn't mean all the cases > should enable `AppendBrokerTimestampMetadataInterceptor`. If users don't > use TTL, the producers are > well-maintained. Is there any reason to have at least 16 bytes append to > each entry? The default configuration > behavior change also needs to be highlighted in the release note. Users > need to know if they don't disable the > `AppendBrokerTimestampMetadataInterceptor` manually on a newer version, the > retained data size will increase. > > BTW, we need to explain each interceptor in the `broker.conf` and why ` > AppendBrokerTimestampMetadataInterceptor` > is enabled by default. If users don't want to read it, change anything in > `broker.conf` they don't really know what it is. > How can they know what they expect? > > Regards, > Penghui > > > On Mon, Jan 15, 2024 at 11:09 PM Lin Lin wrote: > > > User disabling `AppendBrokerTimestampMetadataInterceptor` does not mean > > that they allow this bug. > > This is a configuration, not an API. It is difficult to use documentation > > to regulate user behavior. > > > > Maybe we can add a new field (BrokerTimestamp) to save the timestamp on > > the Broker side. > > The time priority for trimming Ledger is as follows: > > > > BrokerPublishTime > BrokerTimestamp > > > > If `BrokerPublishTime` exists, `BrokerReceiveTime` is not set. > > If not, we set `BrokerReceiveTime` and is no longer affected by client > > time. > > > > On 2024/01/15 02:15:17 PengHui Li wrote: > > > IMO, we should enable `AppendBrokerTimestampMetadataInterceptor` by > > default. > > > Users can still have a way to disable it if they care about the > > additional > > > metadata stored > > > in each entry. > > > > > > For the `hasBrokerPublishTime` API. The topic might also have historical > > > data without > > > broker publish time. So, it should be fine to keep this API because we > > > don't know how > > > long users will retain their data. > > > > > > Regards, > > > Penghui > > > > > > On Sat, Jan 6, 2024 at 10:35 PM linlin wrote: > > > > > > > Now, if the message's metadata does not set a broker side timestamp, > > the > > > > ledger expiration check is based on the client's publish time. > > > > > > > > When the client machine's clock is incorrect (eg: set to 1 year later) > > , > > > > the ledger can not be cleaned up. Issue > > > > https://github.com/apache/pulsar/issues/21347 > > > > > > > > `AppendBrokerTimestampMetadataInterceptor` can set timestamp for > > messages > > > > on the broker side, but we can not ensure that the > > > > `AppendBrokerTimestampMetadataInterceptor` is always enable > > > > > > > > Therefore, I open this PR(https://github.com/apache/pulsar/pull/21835) > > to > > > > always set the broker timestamp for messages on the broker side. > > > > > > > > With this change , firstly we should deprecate > > > > AppendBrokerTimestampMetadataInterceptor. > > > > It no longer needs to exist > > > > > > > > Secondly, we should deprecate `hasBrokerPublishTime` in interface > > Message. > > > > It always returns true. > > > > This API is created in PR (https://github.com/apache/pulsar/pull/11553 > > ) > > > > This PR is for the client to obtain BrokerPublishTime, so the > > > > `hasBrokerPublishTime` API is not necessary. > > > > > > > > > >
Re: [VOTE] PIP-330: getMessagesById gets all messages
+1 (binding) Regards Jiwei Guo (Tboy) On Wed, Jan 17, 2024 at 5:39 PM 太上玄元道君 wrote: > +1 nonbinding > > > > 2024年1月17日 17:35,Asaf Mesika 写道: > > > > +1 (non-binding) > > > > On Tue, Jan 16, 2024 at 4:43 AM Dezhi Liu wrote: > > > >> +1 (non-binding) > >> > >> Thanks, > >> Dezhi Liu > >> > >> On 2024/01/15 09:33:48 Zixuan Liu wrote: > >>> Hi Pulsar Community, > >>> > >>> Voting for PIP-330: getMessagesById gets all messages > >>> > >>> PIP: https://github.com/apache/pulsar/pull/21873 > >>> Discussion thread: > >>> https://lists.apache.org/thread/vqyh3mvtvovd383sd8zxnlzsspdr863z > >>> > >>> Thanks, > >>> Zixuan > >>> > >> > >
Re: [VOTE] PIP-330: getMessagesById gets all messages
+1 (binding) Thanks, Yunze On Wed, Jan 17, 2024 at 6:55 PM guo jiwei wrote: > > +1 (binding) > > > Regards > Jiwei Guo (Tboy) > > > On Wed, Jan 17, 2024 at 5:39 PM 太上玄元道君 wrote: > > > +1 nonbinding > > > > > > > 2024年1月17日 17:35,Asaf Mesika 写道: > > > > > > +1 (non-binding) > > > > > > On Tue, Jan 16, 2024 at 4:43 AM Dezhi Liu wrote: > > > > > >> +1 (non-binding) > > >> > > >> Thanks, > > >> Dezhi Liu > > >> > > >> On 2024/01/15 09:33:48 Zixuan Liu wrote: > > >>> Hi Pulsar Community, > > >>> > > >>> Voting for PIP-330: getMessagesById gets all messages > > >>> > > >>> PIP: https://github.com/apache/pulsar/pull/21873 > > >>> Discussion thread: > > >>> https://lists.apache.org/thread/vqyh3mvtvovd383sd8zxnlzsspdr863z > > >>> > > >>> Thanks, > > >>> Zixuan > > >>> > > >> > > > >
Re: [DISCUSS] Always set a broker side timestamp for message and deprecate some API
>From my perspective, it's a serious security issue. The client side could easily make Pulsar's message expiry mechanism not work. Even if it's not a malicious change made by users, it would be hard to figure out what makes the message not expired. > Users can still have a way to disable it if they care about the additional > metadata stored in each entry. I don't think we should provide such an option for users. Compared with the serious security issue, the extra overhead (13 bytes) per entry (not per message) should never be a concern. My suggestion is that the broker entry metadata with the broker side timestamp should always be generated and we should not allow users to disable it. However, `hasBrokerPublishTime` should not be deprecated. The broker entry metadata won't be transferred to the client unless `exposingBrokerEntryMetadataToClientEnabled` is true. Thanks, Yunze On Wed, Jan 17, 2024 at 6:45 PM Lin Lin wrote: > > Only enabled `AppendBrokerTimestampMetadataInterceptor` is not enough. > > We should also improve the TTL: > When client publish time > Ledger create time + Ledger rollover time, > and `brokerPublishTime` is not set, > we can make Ledger TTL time = Ledger create time + Ledger rollover time. > > This change can make sure entry expired when client clock is not right. > > On 2024/01/17 04:31:16 PengHui Li wrote: > > > User disabling `AppendBrokerTimestampMetadataInterceptor` does not mean > > that they allow this bug. > > This is a configuration, not an API. It is difficult to use documentation > > to regulate user behavior. > > > > Actually, they need to know why they want to disable ` > > AppendBrokerTimestampMetadataInterceptor`. > > Otherwise, why do they want to disable `AppendBrokerTimestampMetadata > > Interceptor`? > > > > Pulsar's default configuration tries to provide a best practice for most of > > the cases. To avoid the potential > > problem, Pulsar enables `AppendBrokerTimestampMetadataInterceptor` by > > default. But it doesn't mean all the cases > > should enable `AppendBrokerTimestampMetadataInterceptor`. If users don't > > use TTL, the producers are > > well-maintained. Is there any reason to have at least 16 bytes append to > > each entry? The default configuration > > behavior change also needs to be highlighted in the release note. Users > > need to know if they don't disable the > > `AppendBrokerTimestampMetadataInterceptor` manually on a newer version, the > > retained data size will increase. > > > > BTW, we need to explain each interceptor in the `broker.conf` and why ` > > AppendBrokerTimestampMetadataInterceptor` > > is enabled by default. If users don't want to read it, change anything in > > `broker.conf` they don't really know what it is. > > How can they know what they expect? > > > > Regards, > > Penghui > > > > > > On Mon, Jan 15, 2024 at 11:09 PM Lin Lin wrote: > > > > > User disabling `AppendBrokerTimestampMetadataInterceptor` does not mean > > > that they allow this bug. > > > This is a configuration, not an API. It is difficult to use documentation > > > to regulate user behavior. > > > > > > Maybe we can add a new field (BrokerTimestamp) to save the timestamp on > > > the Broker side. > > > The time priority for trimming Ledger is as follows: > > > > > > BrokerPublishTime > BrokerTimestamp > > > > > > If `BrokerPublishTime` exists, `BrokerReceiveTime` is not set. > > > If not, we set `BrokerReceiveTime` and is no longer affected by client > > > time. > > > > > > On 2024/01/15 02:15:17 PengHui Li wrote: > > > > IMO, we should enable `AppendBrokerTimestampMetadataInterceptor` by > > > default. > > > > Users can still have a way to disable it if they care about the > > > additional > > > > metadata stored > > > > in each entry. > > > > > > > > For the `hasBrokerPublishTime` API. The topic might also have historical > > > > data without > > > > broker publish time. So, it should be fine to keep this API because we > > > > don't know how > > > > long users will retain their data. > > > > > > > > Regards, > > > > Penghui > > > > > > > > On Sat, Jan 6, 2024 at 10:35 PM linlin wrote: > > > > > > > > > Now, if the message's metadata does not set a broker side timestamp, > > > the > > > > > ledger expiration check is based on the client's publish time. > > > > > > > > > > When the client machine's clock is incorrect (eg: set to 1 year later) > > > , > > > > > the ledger can not be cleaned up. Issue > > > > > https://github.com/apache/pulsar/issues/21347 > > > > > > > > > > `AppendBrokerTimestampMetadataInterceptor` can set timestamp for > > > messages > > > > > on the broker side, but we can not ensure that the > > > > > `AppendBrokerTimestampMetadataInterceptor` is always enable > > > > > > > > > > Therefore, I open this PR(https://github.com/apache/pulsar/pull/21835) > > > to > > > > > always set the broker timestamp for messages on the broker side. > > > > > > > > > > With this change , firstly we should deprecate > > > > > AppendBrokerTimestamp
Re: [DISCUSS] Always set a broker side timestamp for message and deprecate some API
> We should also improve the TTL: > When client publish time > Ledger create time + Ledger rollover time, > and `brokerPublishTime` is not set, > we can make Ledger TTL time = Ledger create time + Ledger rollover time. > This change can make sure entry expires when client clock is not right. This solution looks good to me. Regards, Penghui On Wed, Jan 17, 2024 at 10:31 PM Yunze Xu wrote: > From my perspective, it's a serious security issue. The client side > could easily make Pulsar's message expiry mechanism not work. Even if > it's not a malicious change made by users, it would be hard to figure > out what makes the message not expired. > > > Users can still have a way to disable it if they care about the > additional metadata stored in each entry. > > I don't think we should provide such an option for users. Compared > with the serious security issue, the extra overhead (13 bytes) per > entry (not per message) should never be a concern. > > My suggestion is that the broker entry metadata with the broker side > timestamp should always be generated and we should not allow users to > disable it. However, `hasBrokerPublishTime` should not be deprecated. > The broker entry metadata won't be transferred to the client unless > `exposingBrokerEntryMetadataToClientEnabled` is true. > > Thanks, > Yunze > > On Wed, Jan 17, 2024 at 6:45 PM Lin Lin wrote: > > > > Only enabled `AppendBrokerTimestampMetadataInterceptor` is not enough. > > > > We should also improve the TTL: > > When client publish time > Ledger create time + Ledger rollover time, > > and `brokerPublishTime` is not set, > > we can make Ledger TTL time = Ledger create time + Ledger rollover time. > > > > This change can make sure entry expired when client clock is not right. > > > > On 2024/01/17 04:31:16 PengHui Li wrote: > > > > User disabling `AppendBrokerTimestampMetadataInterceptor` does not > mean > > > that they allow this bug. > > > This is a configuration, not an API. It is difficult to use > documentation > > > to regulate user behavior. > > > > > > Actually, they need to know why they want to disable ` > > > AppendBrokerTimestampMetadataInterceptor`. > > > Otherwise, why do they want to disable `AppendBrokerTimestampMetadata > > > Interceptor`? > > > > > > Pulsar's default configuration tries to provide a best practice for > most of > > > the cases. To avoid the potential > > > problem, Pulsar enables `AppendBrokerTimestampMetadataInterceptor` by > > > default. But it doesn't mean all the cases > > > should enable `AppendBrokerTimestampMetadataInterceptor`. If users > don't > > > use TTL, the producers are > > > well-maintained. Is there any reason to have at least 16 bytes append > to > > > each entry? The default configuration > > > behavior change also needs to be highlighted in the release note. Users > > > need to know if they don't disable the > > > `AppendBrokerTimestampMetadataInterceptor` manually on a newer > version, the > > > retained data size will increase. > > > > > > BTW, we need to explain each interceptor in the `broker.conf` and why ` > > > AppendBrokerTimestampMetadataInterceptor` > > > is enabled by default. If users don't want to read it, change anything > in > > > `broker.conf` they don't really know what it is. > > > How can they know what they expect? > > > > > > Regards, > > > Penghui > > > > > > > > > On Mon, Jan 15, 2024 at 11:09 PM Lin Lin wrote: > > > > > > > User disabling `AppendBrokerTimestampMetadataInterceptor` does not > mean > > > > that they allow this bug. > > > > This is a configuration, not an API. It is difficult to use > documentation > > > > to regulate user behavior. > > > > > > > > Maybe we can add a new field (BrokerTimestamp) to save the timestamp > on > > > > the Broker side. > > > > The time priority for trimming Ledger is as follows: > > > > > > > > BrokerPublishTime > BrokerTimestamp > > > > > > > > If `BrokerPublishTime` exists, `BrokerReceiveTime` is not set. > > > > If not, we set `BrokerReceiveTime` and is no longer affected by > client > > > > time. > > > > > > > > On 2024/01/15 02:15:17 PengHui Li wrote: > > > > > IMO, we should enable `AppendBrokerTimestampMetadataInterceptor` by > > > > default. > > > > > Users can still have a way to disable it if they care about the > > > > additional > > > > > metadata stored > > > > > in each entry. > > > > > > > > > > For the `hasBrokerPublishTime` API. The topic might also have > historical > > > > > data without > > > > > broker publish time. So, it should be fine to keep this API > because we > > > > > don't know how > > > > > long users will retain their data. > > > > > > > > > > Regards, > > > > > Penghui > > > > > > > > > > On Sat, Jan 6, 2024 at 10:35 PM linlin wrote: > > > > > > > > > > > Now, if the message's metadata does not set a broker side > timestamp, > > > > the > > > > > > ledger expiration check is based on the client's publish time. > > > > > > > > > > > > When the client machine's clock is incorrect (eg: set to
Re: [DISCUSS] Always set a broker side timestamp for message and deprecate some API
> I don't think we should provide such an option for users. Compared with the serious security issue, the extra overhead (13 bytes) per entry (not per message) should never be a concern. It actually changed the stored data format. As far as I know, there are some users who read data from the offloaded data files directly. So, at least, we should provide a way to allow them to avoid the stored data format change. And 13 bytes are also related to the use case. If users don't use TTL and send too many small messages without batch. The cost will be increased by 10% or 20%. The issue will not damage the data or cause the data loss. You can monitor it, to skip the backlogs. TTL is not the only way to evict backlogs. Thanks, Penghui On Thu, Jan 18, 2024 at 10:33 AM PengHui Li wrote: > > We should also improve the TTL: > > When client publish time > Ledger create time + Ledger rollover time, > > and `brokerPublishTime` is not set, > > we can make Ledger TTL time = Ledger create time + Ledger rollover > time. > > > This change can make sure entry expires when client clock is not right. > > This solution looks good to me. > > Regards, > Penghui > > > On Wed, Jan 17, 2024 at 10:31 PM Yunze Xu wrote: > >> From my perspective, it's a serious security issue. The client side >> could easily make Pulsar's message expiry mechanism not work. Even if >> it's not a malicious change made by users, it would be hard to figure >> out what makes the message not expired. >> >> > Users can still have a way to disable it if they care about the >> additional metadata stored in each entry. >> >> I don't think we should provide such an option for users. Compared >> with the serious security issue, the extra overhead (13 bytes) per >> entry (not per message) should never be a concern. >> >> My suggestion is that the broker entry metadata with the broker side >> timestamp should always be generated and we should not allow users to >> disable it. However, `hasBrokerPublishTime` should not be deprecated. >> The broker entry metadata won't be transferred to the client unless >> `exposingBrokerEntryMetadataToClientEnabled` is true. >> >> Thanks, >> Yunze >> >> On Wed, Jan 17, 2024 at 6:45 PM Lin Lin wrote: >> > >> > Only enabled `AppendBrokerTimestampMetadataInterceptor` is not enough. >> > >> > We should also improve the TTL: >> > When client publish time > Ledger create time + Ledger rollover time, >> > and `brokerPublishTime` is not set, >> > we can make Ledger TTL time = Ledger create time + Ledger rollover >> time. >> > >> > This change can make sure entry expired when client clock is not right. >> > >> > On 2024/01/17 04:31:16 PengHui Li wrote: >> > > > User disabling `AppendBrokerTimestampMetadataInterceptor` does not >> mean >> > > that they allow this bug. >> > > This is a configuration, not an API. It is difficult to use >> documentation >> > > to regulate user behavior. >> > > >> > > Actually, they need to know why they want to disable ` >> > > AppendBrokerTimestampMetadataInterceptor`. >> > > Otherwise, why do they want to disable `AppendBrokerTimestampMetadata >> > > Interceptor`? >> > > >> > > Pulsar's default configuration tries to provide a best practice for >> most of >> > > the cases. To avoid the potential >> > > problem, Pulsar enables `AppendBrokerTimestampMetadataInterceptor` by >> > > default. But it doesn't mean all the cases >> > > should enable `AppendBrokerTimestampMetadataInterceptor`. If users >> don't >> > > use TTL, the producers are >> > > well-maintained. Is there any reason to have at least 16 bytes append >> to >> > > each entry? The default configuration >> > > behavior change also needs to be highlighted in the release note. >> Users >> > > need to know if they don't disable the >> > > `AppendBrokerTimestampMetadataInterceptor` manually on a newer >> version, the >> > > retained data size will increase. >> > > >> > > BTW, we need to explain each interceptor in the `broker.conf` and why >> ` >> > > AppendBrokerTimestampMetadataInterceptor` >> > > is enabled by default. If users don't want to read it, change >> anything in >> > > `broker.conf` they don't really know what it is. >> > > How can they know what they expect? >> > > >> > > Regards, >> > > Penghui >> > > >> > > >> > > On Mon, Jan 15, 2024 at 11:09 PM Lin Lin wrote: >> > > >> > > > User disabling `AppendBrokerTimestampMetadataInterceptor` does not >> mean >> > > > that they allow this bug. >> > > > This is a configuration, not an API. It is difficult to use >> documentation >> > > > to regulate user behavior. >> > > > >> > > > Maybe we can add a new field (BrokerTimestamp) to save the >> timestamp on >> > > > the Broker side. >> > > > The time priority for trimming Ledger is as follows: >> > > > >> > > > BrokerPublishTime > BrokerTimestamp >> > > > >> > > > If `BrokerPublishTime` exists, `BrokerReceiveTime` is not set. >> > > > If not, we set `BrokerReceiveTime` and is no longer affected by >> client >> > > > time. >> > > > >> > > > On
Re: [VOTE] PIP-330: getMessagesById gets all messages
+1 (binding) Regards, Penghui On Wed, Jan 17, 2024 at 8:03 PM Yunze Xu wrote: > +1 (binding) > > Thanks, > Yunze > > On Wed, Jan 17, 2024 at 6:55 PM guo jiwei wrote: > > > > +1 (binding) > > > > > > Regards > > Jiwei Guo (Tboy) > > > > > > On Wed, Jan 17, 2024 at 5:39 PM 太上玄元道君 wrote: > > > > > +1 nonbinding > > > > > > > > > > 2024年1月17日 17:35,Asaf Mesika 写道: > > > > > > > > +1 (non-binding) > > > > > > > > On Tue, Jan 16, 2024 at 4:43 AM Dezhi Liu > wrote: > > > > > > > >> +1 (non-binding) > > > >> > > > >> Thanks, > > > >> Dezhi Liu > > > >> > > > >> On 2024/01/15 09:33:48 Zixuan Liu wrote: > > > >>> Hi Pulsar Community, > > > >>> > > > >>> Voting for PIP-330: getMessagesById gets all messages > > > >>> > > > >>> PIP: https://github.com/apache/pulsar/pull/21873 > > > >>> Discussion thread: > > > >>> https://lists.apache.org/thread/vqyh3mvtvovd383sd8zxnlzsspdr863z > > > >>> > > > >>> Thanks, > > > >>> Zixuan > > > >>> > > > >> > > > > > > >
Re: [DISCUSS] Always set a broker side timestamp for message and deprecate some API
I see, it only affects the TTL. We can acknowledge the messages to clean up the data. Then we have to document the `brokerInterceptors` config clearly to say TTL might not work if `AppendBrokerTimestampMetadataInterceptor` is not configured. Thanks, Yunze On Thu, Jan 18, 2024 at 10:41 AM PengHui Li wrote: > > > I don't think we should provide such an option for users. Compared > with the serious security issue, the extra overhead (13 bytes) per > entry (not per message) should never be a concern. > > It actually changed the stored data format. As far as I know, there are > some users > who read data from the offloaded data files directly. So, at least, we > should provide > a way to allow them to avoid the stored data format change. > > And 13 bytes are also related to the use case. If users don't use TTL and > send > too many small messages without batch. The cost will be increased by 10% or > 20%. > The issue will not damage the data or cause the data loss. > You can monitor it, to skip the backlogs. TTL is not the only way to evict > backlogs. > > Thanks, > Penghui > > On Thu, Jan 18, 2024 at 10:33 AM PengHui Li wrote: > > > > We should also improve the TTL: > > > When client publish time > Ledger create time + Ledger rollover time, > > > and `brokerPublishTime` is not set, > > > we can make Ledger TTL time = Ledger create time + Ledger rollover > > time. > > > > > This change can make sure entry expires when client clock is not right. > > > > This solution looks good to me. > > > > Regards, > > Penghui > > > > > > On Wed, Jan 17, 2024 at 10:31 PM Yunze Xu wrote: > > > >> From my perspective, it's a serious security issue. The client side > >> could easily make Pulsar's message expiry mechanism not work. Even if > >> it's not a malicious change made by users, it would be hard to figure > >> out what makes the message not expired. > >> > >> > Users can still have a way to disable it if they care about the > >> additional metadata stored in each entry. > >> > >> I don't think we should provide such an option for users. Compared > >> with the serious security issue, the extra overhead (13 bytes) per > >> entry (not per message) should never be a concern. > >> > >> My suggestion is that the broker entry metadata with the broker side > >> timestamp should always be generated and we should not allow users to > >> disable it. However, `hasBrokerPublishTime` should not be deprecated. > >> The broker entry metadata won't be transferred to the client unless > >> `exposingBrokerEntryMetadataToClientEnabled` is true. > >> > >> Thanks, > >> Yunze > >> > >> On Wed, Jan 17, 2024 at 6:45 PM Lin Lin wrote: > >> > > >> > Only enabled `AppendBrokerTimestampMetadataInterceptor` is not enough. > >> > > >> > We should also improve the TTL: > >> > When client publish time > Ledger create time + Ledger rollover time, > >> > and `brokerPublishTime` is not set, > >> > we can make Ledger TTL time = Ledger create time + Ledger rollover > >> time. > >> > > >> > This change can make sure entry expired when client clock is not right. > >> > > >> > On 2024/01/17 04:31:16 PengHui Li wrote: > >> > > > User disabling `AppendBrokerTimestampMetadataInterceptor` does not > >> mean > >> > > that they allow this bug. > >> > > This is a configuration, not an API. It is difficult to use > >> documentation > >> > > to regulate user behavior. > >> > > > >> > > Actually, they need to know why they want to disable ` > >> > > AppendBrokerTimestampMetadataInterceptor`. > >> > > Otherwise, why do they want to disable `AppendBrokerTimestampMetadata > >> > > Interceptor`? > >> > > > >> > > Pulsar's default configuration tries to provide a best practice for > >> most of > >> > > the cases. To avoid the potential > >> > > problem, Pulsar enables `AppendBrokerTimestampMetadataInterceptor` by > >> > > default. But it doesn't mean all the cases > >> > > should enable `AppendBrokerTimestampMetadataInterceptor`. If users > >> don't > >> > > use TTL, the producers are > >> > > well-maintained. Is there any reason to have at least 16 bytes append > >> to > >> > > each entry? The default configuration > >> > > behavior change also needs to be highlighted in the release note. > >> Users > >> > > need to know if they don't disable the > >> > > `AppendBrokerTimestampMetadataInterceptor` manually on a newer > >> version, the > >> > > retained data size will increase. > >> > > > >> > > BTW, we need to explain each interceptor in the `broker.conf` and why > >> ` > >> > > AppendBrokerTimestampMetadataInterceptor` > >> > > is enabled by default. If users don't want to read it, change > >> anything in > >> > > `broker.conf` they don't really know what it is. > >> > > How can they know what they expect? > >> > > > >> > > Regards, > >> > > Penghui > >> > > > >> > > > >> > > On Mon, Jan 15, 2024 at 11:09 PM Lin Lin wrote: > >> > > > >> > > > User disabling `AppendBrokerTimestampMetadataInterceptor` does not > >> mean > >> > > > that they allow this bug. > >> > > >
Re: [VERIFY] Pulsar Release 3.2.0 Candidate 2
It looks like something broke the behavior of the querystate from Pulsar Functions. 1. The querystate exit with `Reason: key 'hello' doesn't exist`, which the old version will not exit ``` lipenghui@lipenghuis-MacBook-Pro apache-pulsar-3.2.0 % bin/pulsar-admin functions querystate --tenant test --namespace test-namespace --name word_count -k hello -w key 'hello' doesn't exist. Reason: key 'hello' doesn't exist. ``` 2. The output missed the `version` field ``` "key": "hello", "numberValue": 20, "version": 19 ``` ``` "key": "hello", "stringValue": "\u\u\u\u\u\u\u\n", "numberValue": 10 ``` Regards, Penghui On Tue, Jan 16, 2024 at 6:34 PM Zixuan Liu wrote: > +1 (non-binding) > > - Checked the checksums and signatures > - Built with Temurin-17.0.6+10 > - Run standalone > - Checked producer and consumer > > Thanks, > Zixuan > > guo jiwei 于2024年1月16日周二 15:57写道: > > > This is the second release candidate for Apache Pulsar version 3.2.0. > > > > It fixes the following issues: > > https://github.com/apache/pulsar/milestone/36?closed=1 > > > > *** Please download, test and verify on this release. This release > > candidate verification will stay open until Jan 15 *** > > > > Note that we are verifying upon the source (tag), binaries are provided > for > > convenience. > > > > Source and binary files: > > https://dist.apache.org/repos/dist/dev/pulsar/pulsar-3.2.0-candidate-2/ > > > > SHA-512 checksums: > > > > > > > 57af4de0531baada79ff3eb5e51659ae3d2b6732840a27861636d3538d8cca2b3aeb77d3f5cecf37dc5d8a0de11a05d0775ae2d2ca9398f9747fbe662e1cdeae > > > > apache-pulsar-3.2.0-bin.tar.gz > > > > > > > 7ddd8df261f4ffeed2e4f029aa475f523bb9505869870db681a57b7df25af01d5670ba00bc7e062ddae0b962eaed9ee828a4cd7f4744a965afd77e9ae1239d6a > > > > apache-pulsar-3.2.0-src.tar.gz > > > > Maven staging repo: > > https://repository.apache.org/content/repositories/orgapachepulsar-1262/ > > > > The tag to verify: > > v3.2.0-candidate-2 (5348e1b9124052a454f66769ae3e9f54ee0a75d4) > > https://github.com/apache/pulsar/commits/v3.2.0-candidate-2/ > > > > Pulsar's KEYS file containing PGP keys you use to sign the release: > > https://dist.apache.org/repos/dist/dev/pulsar/KEYS > > > > Docker images: > > > > pulsar images: > > > > > https://hub.docker.com/layers/technoboy8/pulsar/3.2.0-5348e1b/images/sha256-cf02a08e588ca493d10dbe472b72f5e64e4f90b8b5ae01e91321f336776a5a4e?context=repo > > < > > > https://hub.docker.com/layers/mattison/pulsar/3.1.0-candidate-1/images/sha256-0efbaad7d893cc5041a46a2d4d56432bda855ae4068a38349777d1be6e98d27d?context=explore > > > > > pulsar-all images: > > > > > https://hub.docker.com/layers/technoboy8/pulsar-all/3.2.0-5348e1b/images/sha256-801e395fc26257e2beb5ea0574fe27f5a4ff956c85cbcd6b362432d6dda28b95?context=repo > > > > Please download the source package, and follow the README to build > > and run the Pulsar standalone service. > > > > Note that this RC doesn't require a formal vote, but we would also > > appreciate your feedback with +1/-1. And please provide specific > > comments if your feedback is not +1. > > > > > > Regards > > Jiwei Guo (Tboy) > > >
Re: [DISCUSS] Always set a broker side timestamp for message and deprecate some API
> I see, it only affects the TTL. We can acknowledge the messages to clean up the data. Then we have to document the `brokerInterceptors` config clearly to say TTL might not work if `AppendBrokerTimestampMetadataInterceptor` is not configured. Yes, and LinLin figured out a better solution to mitigate this issue. > We should also improve the TTL: > When client publish time > Ledger create time + Ledger rollover time, > and `brokerPublishTime` is not set, > we can make Ledger TTL time = Ledger create time + Ledger rollover time. BTW, I think we can use the timestamp in the Ledger Info directly. It's not the ledger creation time, it should be the close timestamp. Regards, Penghui On Thu, Jan 18, 2024 at 11:25 AM Yunze Xu wrote: > I see, it only affects the TTL. We can acknowledge the messages to > clean up the data. Then we have to document the `brokerInterceptors` > config clearly to say TTL might not work if > `AppendBrokerTimestampMetadataInterceptor` is not configured. > > Thanks, > Yunze > > On Thu, Jan 18, 2024 at 10:41 AM PengHui Li wrote: > > > > > I don't think we should provide such an option for users. Compared > > with the serious security issue, the extra overhead (13 bytes) per > > entry (not per message) should never be a concern. > > > > It actually changed the stored data format. As far as I know, there are > > some users > > who read data from the offloaded data files directly. So, at least, we > > should provide > > a way to allow them to avoid the stored data format change. > > > > And 13 bytes are also related to the use case. If users don't use TTL and > > send > > too many small messages without batch. The cost will be increased by 10% > or > > 20%. > > The issue will not damage the data or cause the data loss. > > You can monitor it, to skip the backlogs. TTL is not the only way to > evict > > backlogs. > > > > Thanks, > > Penghui > > > > On Thu, Jan 18, 2024 at 10:33 AM PengHui Li wrote: > > > > > > We should also improve the TTL: > > > > When client publish time > Ledger create time + Ledger rollover time, > > > > and `brokerPublishTime` is not set, > > > > we can make Ledger TTL time = Ledger create time + Ledger rollover > > > time. > > > > > > > This change can make sure entry expires when client clock is not > right. > > > > > > This solution looks good to me. > > > > > > Regards, > > > Penghui > > > > > > > > > On Wed, Jan 17, 2024 at 10:31 PM Yunze Xu wrote: > > > > > >> From my perspective, it's a serious security issue. The client side > > >> could easily make Pulsar's message expiry mechanism not work. Even if > > >> it's not a malicious change made by users, it would be hard to figure > > >> out what makes the message not expired. > > >> > > >> > Users can still have a way to disable it if they care about the > > >> additional metadata stored in each entry. > > >> > > >> I don't think we should provide such an option for users. Compared > > >> with the serious security issue, the extra overhead (13 bytes) per > > >> entry (not per message) should never be a concern. > > >> > > >> My suggestion is that the broker entry metadata with the broker side > > >> timestamp should always be generated and we should not allow users to > > >> disable it. However, `hasBrokerPublishTime` should not be deprecated. > > >> The broker entry metadata won't be transferred to the client unless > > >> `exposingBrokerEntryMetadataToClientEnabled` is true. > > >> > > >> Thanks, > > >> Yunze > > >> > > >> On Wed, Jan 17, 2024 at 6:45 PM Lin Lin wrote: > > >> > > > >> > Only enabled `AppendBrokerTimestampMetadataInterceptor` is not > enough. > > >> > > > >> > We should also improve the TTL: > > >> > When client publish time > Ledger create time + Ledger rollover > time, > > >> > and `brokerPublishTime` is not set, > > >> > we can make Ledger TTL time = Ledger create time + Ledger rollover > > >> time. > > >> > > > >> > This change can make sure entry expired when client clock is not > right. > > >> > > > >> > On 2024/01/17 04:31:16 PengHui Li wrote: > > >> > > > User disabling `AppendBrokerTimestampMetadataInterceptor` does > not > > >> mean > > >> > > that they allow this bug. > > >> > > This is a configuration, not an API. It is difficult to use > > >> documentation > > >> > > to regulate user behavior. > > >> > > > > >> > > Actually, they need to know why they want to disable ` > > >> > > AppendBrokerTimestampMetadataInterceptor`. > > >> > > Otherwise, why do they want to disable > `AppendBrokerTimestampMetadata > > >> > > Interceptor`? > > >> > > > > >> > > Pulsar's default configuration tries to provide a best practice > for > > >> most of > > >> > > the cases. To avoid the potential > > >> > > problem, Pulsar enables > `AppendBrokerTimestampMetadataInterceptor` by > > >> > > default. But it doesn't mean all the cases > > >> > > should enable `AppendBrokerTimestampMetadataInterceptor`. If users > > >> don't > > >> > > use TTL, the producers are > > >> > > well-maintained. Is there
Re: [VOTE] PIP-330: getMessagesById gets all messages
This PIP has been approved by 3 +1 (binding): - Jiwei - Yunze - Penghui Thanks, Zixuan Zixuan Liu 于2024年1月15日周一 17:33写道: > Hi Pulsar Community, > > Voting for PIP-330: getMessagesById gets all messages > > PIP: https://github.com/apache/pulsar/pull/21873 > Discussion thread: > https://lists.apache.org/thread/vqyh3mvtvovd383sd8zxnlzsspdr863z > > Thanks, > Zixuan >
[VOTE] Release Apache Pulsar Helm Chart 3.2.0 based on 3.2.0-candidate-2
Hello Apache Pulsar Community, This is a call for the vote to release the Apache Pulsar Helm Chart version 3.2.0. Release notes for 3.2.0-candidate-2: https://github.com/apache/pulsar-helm-chart/releases/tag/pulsar-3.2.0-candidate-2 The release candidate is available at: https://dist.apache.org/repos/dist/dev/pulsar/helm-chart/3.2.0-candidate-2/ pulsar-chart-3.2.0-source.tar.gz - is the "main source release". pulsar-3.2.0.tgz - is the binary Helm Chart release. Public keys are available at: https://www.apache.org/dist/pulsar/KEYS For convenience "index.yaml" has been uploaded (though excluded from voting), so you can also run the below commands. helm repo add --force-update apache-pulsar-dist-dev https://dist.apache.org/repos/dist/dev/pulsar/helm-chart/3.2.0-candidate-2/ helm repo update helm install pulsar apache-pulsar-dist-dev/pulsar --version 3.2.0 --set affinity.anti_affinity=false pulsar-3.2.0.tgz.prov - is also uploaded for verifying Chart Integrity, though it is not strictly required for releasing the artifact based on ASF Guidelines. You can optionally verify this file using this helm plugin https://github.com/technosophos/helm-gpg, or by using helm --verify (https://helm.sh/docs/helm/helm_verify/). helm fetch --prov apache-pulsar-dist-dev/pulsar helm plugin install https://github.com/technosophos/helm-gpg helm gpg verify pulsar-3.2.0.tgz The vote will be open for at least 72 hours. Only votes from PMC members are binding, but members of the community are encouraged to test the release and vote with "(non-binding)". For license checks, the .rat-excludes files is included, so you can run the following to verify licenses (just update ): tar -xvf pulsar-chart-3.2.0-source.tar.gz cd pulsar-chart-3.2.0 java -jar /apache-rat-0.15/apache-rat-0.15.jar . -E .rat-excludes Please note that the version number excludes the `-candidate-X` string, so it's now simply 3.2.0. This will allow us to rename the artifact without modifying the artifact checksums when we actually release it. Thanks, Lari
Re: [VOTE] Release Apache Pulsar Helm Chart 3.2.0 based on 3.2.0-candidate-1
This release vote is cancelled since there's a new vote out for Apache Pulsar Helm Chart 3.2.0-candidate-2: https://lists.apache.org/thread/f0zn9882wmkwcm768v1h38ys6z5qncto -Lari On 2024/01/17 10:07:32 Lari Hotari wrote: > Hello Apache Pulsar Community, > > This is a call for the vote to release the Apache Pulsar Helm Chart version > 3.2.0. > > Release notes for 3.2.0-candidate-1: > https://github.com/apache/pulsar-helm-chart/releases/tag/pulsar-3.2.0-candidate-1 > > The release candidate is available at: > https://dist.apache.org/repos/dist/dev/pulsar/helm-chart/3.2.0-candidate-1/ > > pulsar-chart-3.2.0-source.tar.gz - is the "main source release". > pulsar-3.2.0.tgz - is the binary Helm Chart release. > > Public keys are available at: https://www.apache.org/dist/pulsar/KEYS > > For convenience "index.yaml" has been uploaded (though excluded from voting), > so you can also run the below commands. > > helm repo add --force-update apache-pulsar-dist-dev > https://dist.apache.org/repos/dist/dev/pulsar/helm-chart/3.2.0-candidate-1/ > helm repo update > helm install pulsar apache-pulsar-dist-dev/pulsar --version 3.2.0 --set > affinity.anti_affinity=false > > pulsar-3.2.0.tgz.prov - is also uploaded for verifying Chart Integrity, > though it is not strictly required for releasing the artifact based on ASF > Guidelines. > > You can optionally verify this file using this helm plugin > https://github.com/technosophos/helm-gpg, or by using helm --verify > (https://helm.sh/docs/helm/helm_verify/). > > helm fetch --prov apache-pulsar-dist-dev/pulsar > helm plugin install https://github.com/technosophos/helm-gpg > helm gpg verify pulsar-3.2.0.tgz > > The vote will be open for at least 72 hours. > > Only votes from PMC members are binding, but members of the community are > encouraged to test the release and vote with "(non-binding)". > > For license checks, the .rat-excludes files is included, so you can run the > following to verify licenses (just update ): > > tar -xvf pulsar-chart-3.2.0-source.tar.gz > cd pulsar-chart-3.2.0 > java -jar /apache-rat-0.15/apache-rat-0.15.jar . -E .rat-excludes > > Please note that the version number excludes the `-candidate-X` string, so > it's now > simply 3.2.0. This will allow us to rename the artifact without modifying > the artifact checksums when we actually release it. > > Thanks, > Lari >