Re: [VOTE] PIP-330: getMessagesById gets all messages

2024-01-17 Thread Asaf Mesika
+1 (non-binding)

On Tue, Jan 16, 2024 at 4:43 AM Dezhi Liu  wrote:

> +1 (non-binding)
>
> Thanks,
> Dezhi Liu
>
> On 2024/01/15 09:33:48 Zixuan Liu wrote:
> > Hi Pulsar Community,
> >
> > Voting for PIP-330: getMessagesById gets all messages
> >
> > PIP: https://github.com/apache/pulsar/pull/21873
> > Discussion thread:
> > https://lists.apache.org/thread/vqyh3mvtvovd383sd8zxnlzsspdr863z
> >
> > Thanks,
> > Zixuan
> >
>


Re: [VOTE] PIP-330: getMessagesById gets all messages

2024-01-17 Thread 太上玄元道君
+1 nonbinding


> 2024年1月17日 17:35,Asaf Mesika  写道:
> 
> +1 (non-binding)
> 
> On Tue, Jan 16, 2024 at 4:43 AM Dezhi Liu  wrote:
> 
>> +1 (non-binding)
>> 
>> Thanks,
>> Dezhi Liu
>> 
>> On 2024/01/15 09:33:48 Zixuan Liu wrote:
>>> Hi Pulsar Community,
>>> 
>>> Voting for PIP-330: getMessagesById gets all messages
>>> 
>>> PIP: https://github.com/apache/pulsar/pull/21873
>>> Discussion thread:
>>> https://lists.apache.org/thread/vqyh3mvtvovd383sd8zxnlzsspdr863z
>>> 
>>> Thanks,
>>> Zixuan
>>> 
>> 



[VOTE] Release Apache Pulsar Helm Chart 3.2.0 based on 3.2.0-candidate-1

2024-01-17 Thread Lari Hotari
Hello Apache Pulsar Community,

This is a call for the vote to release the Apache Pulsar Helm Chart version 
3.2.0.

Release notes for 3.2.0-candidate-1:
https://github.com/apache/pulsar-helm-chart/releases/tag/pulsar-3.2.0-candidate-1

The release candidate is available at:
https://dist.apache.org/repos/dist/dev/pulsar/helm-chart/3.2.0-candidate-1/

pulsar-chart-3.2.0-source.tar.gz - is the "main source release".
pulsar-3.2.0.tgz - is the binary Helm Chart release.

Public keys are available at: https://www.apache.org/dist/pulsar/KEYS

For convenience "index.yaml" has been uploaded (though excluded from voting), 
so you can also run the below commands.

helm repo add --force-update apache-pulsar-dist-dev 
https://dist.apache.org/repos/dist/dev/pulsar/helm-chart/3.2.0-candidate-1/
helm repo update
helm install pulsar apache-pulsar-dist-dev/pulsar --version 3.2.0 --set 
affinity.anti_affinity=false

pulsar-3.2.0.tgz.prov - is also uploaded for verifying Chart Integrity, though 
it is not strictly required for releasing the artifact based on ASF Guidelines. 

You can optionally verify this file using this helm plugin 
https://github.com/technosophos/helm-gpg, or by using helm --verify 
(https://helm.sh/docs/helm/helm_verify/).

helm fetch --prov apache-pulsar-dist-dev/pulsar
helm plugin install https://github.com/technosophos/helm-gpg
helm gpg verify pulsar-3.2.0.tgz

The vote will be open for at least 72 hours.

Only votes from PMC members are binding, but members of the community are
encouraged to test the release and vote with "(non-binding)".

For license checks, the .rat-excludes files is included, so you can run the 
following to verify licenses (just update ):

tar -xvf pulsar-chart-3.2.0-source.tar.gz
cd pulsar-chart-3.2.0
java -jar /apache-rat-0.15/apache-rat-0.15.jar . -E .rat-excludes

Please note that the version number excludes the `-candidate-X` string, so it's 
now
simply 3.2.0. This will allow us to rename the artifact without modifying
the artifact checksums when we actually release it.

Thanks,
Lari


Re: [DISCUSS] Always set a broker side timestamp for message and deprecate some API

2024-01-17 Thread Lin Lin
Only enabled `AppendBrokerTimestampMetadataInterceptor` is not enough.

We should also improve the TTL:
When client publish time > Ledger create time + Ledger rollover time, 
and `brokerPublishTime` is not set,
we can make Ledger TTL time =  Ledger create time + Ledger rollover time.

This change can make sure entry expired when client clock is not right.

On 2024/01/17 04:31:16 PengHui Li wrote:
> > User disabling `AppendBrokerTimestampMetadataInterceptor` does not mean
> that they allow this bug.
> This is a configuration, not an API. It is difficult to use documentation
> to regulate user behavior.
> 
> Actually, they need to know why they want to disable `
> AppendBrokerTimestampMetadataInterceptor`.
> Otherwise, why do they want to disable `AppendBrokerTimestampMetadata
> Interceptor`?
> 
> Pulsar's default configuration tries to provide a best practice for most of
> the cases. To avoid the potential
> problem, Pulsar enables  `AppendBrokerTimestampMetadataInterceptor` by
> default. But it doesn't mean all the cases
> should enable `AppendBrokerTimestampMetadataInterceptor`. If users don't
> use TTL, the producers are
> well-maintained. Is there any reason to have at least 16 bytes append to
> each entry? The default configuration
> behavior change also needs to be highlighted in the release note. Users
> need to know if they don't disable the
> `AppendBrokerTimestampMetadataInterceptor` manually on a newer version, the
> retained data size will increase.
> 
> BTW, we need to explain each interceptor in the `broker.conf` and why `
> AppendBrokerTimestampMetadataInterceptor`
> is enabled by default. If users don't want to read it, change anything in
> `broker.conf` they don't really know what it is.
> How can they know what they expect?
> 
> Regards,
> Penghui
> 
> 
> On Mon, Jan 15, 2024 at 11:09 PM Lin Lin  wrote:
> 
> > User disabling `AppendBrokerTimestampMetadataInterceptor` does not mean
> > that they allow this bug.
> > This is a configuration, not an API. It is difficult to use documentation
> > to regulate user behavior.
> >
> > Maybe we can add a new field (BrokerTimestamp) to save the timestamp on
> > the Broker side.
> > The time priority for trimming Ledger is as follows:
> >
> > BrokerPublishTime > BrokerTimestamp
> >
> > If `BrokerPublishTime` exists, `BrokerReceiveTime` is not set.
> > If not, we set `BrokerReceiveTime`  and is no longer affected by client
> > time.
> >
> > On 2024/01/15 02:15:17 PengHui Li wrote:
> > > IMO, we should enable `AppendBrokerTimestampMetadataInterceptor` by
> > default.
> > > Users can still have a way to disable it if they care about the
> > additional
> > > metadata stored
> > > in each entry.
> > >
> > > For the `hasBrokerPublishTime` API. The topic might also have historical
> > > data without
> > > broker publish time. So, it should be fine to keep this API because we
> > > don't know how
> > > long users will retain their data.
> > >
> > > Regards,
> > > Penghui
> > >
> > > On Sat, Jan 6, 2024 at 10:35 PM linlin  wrote:
> > >
> > > > Now, if the message's metadata does not set a broker side timestamp,
> > the
> > > > ledger expiration check is based on the client's publish time.
> > > >
> > > > When the client machine's clock is incorrect (eg: set to 1 year later)
> > ,
> > > > the ledger can not be cleaned up. Issue
> > > > https://github.com/apache/pulsar/issues/21347
> > > >
> > > > `AppendBrokerTimestampMetadataInterceptor` can set timestamp for
> > messages
> > > > on the broker side, but we can not ensure that the
> > > > `AppendBrokerTimestampMetadataInterceptor` is always enable
> > > >
> > > > Therefore, I open this PR(https://github.com/apache/pulsar/pull/21835)
> > to
> > > > always set the broker timestamp for messages on the broker side.
> > > >
> > > > With this change , firstly we should deprecate
> > > > AppendBrokerTimestampMetadataInterceptor.
> > > > It no longer needs to exist
> > > >
> > > > Secondly, we should deprecate `hasBrokerPublishTime` in interface
> > Message.
> > > > It always returns true.
> > > > This API is created in PR (https://github.com/apache/pulsar/pull/11553
> > )
> > > > This PR is for the client to obtain BrokerPublishTime, so the
> > > > `hasBrokerPublishTime` API is not necessary.
> > > >
> > >
> >
> 


Re: [VOTE] PIP-330: getMessagesById gets all messages

2024-01-17 Thread guo jiwei
+1 (binding)


Regards
Jiwei Guo (Tboy)


On Wed, Jan 17, 2024 at 5:39 PM 太上玄元道君  wrote:

> +1 nonbinding
>
>
> > 2024年1月17日 17:35,Asaf Mesika  写道:
> >
> > +1 (non-binding)
> >
> > On Tue, Jan 16, 2024 at 4:43 AM Dezhi Liu  wrote:
> >
> >> +1 (non-binding)
> >>
> >> Thanks,
> >> Dezhi Liu
> >>
> >> On 2024/01/15 09:33:48 Zixuan Liu wrote:
> >>> Hi Pulsar Community,
> >>>
> >>> Voting for PIP-330: getMessagesById gets all messages
> >>>
> >>> PIP: https://github.com/apache/pulsar/pull/21873
> >>> Discussion thread:
> >>> https://lists.apache.org/thread/vqyh3mvtvovd383sd8zxnlzsspdr863z
> >>>
> >>> Thanks,
> >>> Zixuan
> >>>
> >>
>
>


Re: [VOTE] PIP-330: getMessagesById gets all messages

2024-01-17 Thread Yunze Xu
+1 (binding)

Thanks,
Yunze

On Wed, Jan 17, 2024 at 6:55 PM guo jiwei  wrote:
>
> +1 (binding)
>
>
> Regards
> Jiwei Guo (Tboy)
>
>
> On Wed, Jan 17, 2024 at 5:39 PM 太上玄元道君  wrote:
>
> > +1 nonbinding
> >
> >
> > > 2024年1月17日 17:35,Asaf Mesika  写道:
> > >
> > > +1 (non-binding)
> > >
> > > On Tue, Jan 16, 2024 at 4:43 AM Dezhi Liu  wrote:
> > >
> > >> +1 (non-binding)
> > >>
> > >> Thanks,
> > >> Dezhi Liu
> > >>
> > >> On 2024/01/15 09:33:48 Zixuan Liu wrote:
> > >>> Hi Pulsar Community,
> > >>>
> > >>> Voting for PIP-330: getMessagesById gets all messages
> > >>>
> > >>> PIP: https://github.com/apache/pulsar/pull/21873
> > >>> Discussion thread:
> > >>> https://lists.apache.org/thread/vqyh3mvtvovd383sd8zxnlzsspdr863z
> > >>>
> > >>> Thanks,
> > >>> Zixuan
> > >>>
> > >>
> >
> >


Re: [DISCUSS] Always set a broker side timestamp for message and deprecate some API

2024-01-17 Thread Yunze Xu
>From my perspective, it's a serious security issue. The client side
could easily make Pulsar's message expiry mechanism not work. Even if
it's not a malicious change made by users, it would be hard to figure
out what makes the message not expired.

> Users can still have a way to disable it if they care about the additional 
> metadata stored in each entry.

I don't think we should provide such an option for users. Compared
with the serious security issue, the extra overhead (13 bytes) per
entry (not per message) should never be a concern.

My suggestion is that the broker entry metadata with the broker side
timestamp should always be generated and we should not allow users to
disable it. However, `hasBrokerPublishTime` should not be deprecated.
The broker entry metadata won't be transferred to the client unless
`exposingBrokerEntryMetadataToClientEnabled` is true.

Thanks,
Yunze

On Wed, Jan 17, 2024 at 6:45 PM Lin Lin  wrote:
>
> Only enabled `AppendBrokerTimestampMetadataInterceptor` is not enough.
>
> We should also improve the TTL:
> When client publish time > Ledger create time + Ledger rollover time,
> and `brokerPublishTime` is not set,
> we can make Ledger TTL time =  Ledger create time + Ledger rollover time.
>
> This change can make sure entry expired when client clock is not right.
>
> On 2024/01/17 04:31:16 PengHui Li wrote:
> > > User disabling `AppendBrokerTimestampMetadataInterceptor` does not mean
> > that they allow this bug.
> > This is a configuration, not an API. It is difficult to use documentation
> > to regulate user behavior.
> >
> > Actually, they need to know why they want to disable `
> > AppendBrokerTimestampMetadataInterceptor`.
> > Otherwise, why do they want to disable `AppendBrokerTimestampMetadata
> > Interceptor`?
> >
> > Pulsar's default configuration tries to provide a best practice for most of
> > the cases. To avoid the potential
> > problem, Pulsar enables  `AppendBrokerTimestampMetadataInterceptor` by
> > default. But it doesn't mean all the cases
> > should enable `AppendBrokerTimestampMetadataInterceptor`. If users don't
> > use TTL, the producers are
> > well-maintained. Is there any reason to have at least 16 bytes append to
> > each entry? The default configuration
> > behavior change also needs to be highlighted in the release note. Users
> > need to know if they don't disable the
> > `AppendBrokerTimestampMetadataInterceptor` manually on a newer version, the
> > retained data size will increase.
> >
> > BTW, we need to explain each interceptor in the `broker.conf` and why `
> > AppendBrokerTimestampMetadataInterceptor`
> > is enabled by default. If users don't want to read it, change anything in
> > `broker.conf` they don't really know what it is.
> > How can they know what they expect?
> >
> > Regards,
> > Penghui
> >
> >
> > On Mon, Jan 15, 2024 at 11:09 PM Lin Lin  wrote:
> >
> > > User disabling `AppendBrokerTimestampMetadataInterceptor` does not mean
> > > that they allow this bug.
> > > This is a configuration, not an API. It is difficult to use documentation
> > > to regulate user behavior.
> > >
> > > Maybe we can add a new field (BrokerTimestamp) to save the timestamp on
> > > the Broker side.
> > > The time priority for trimming Ledger is as follows:
> > >
> > > BrokerPublishTime > BrokerTimestamp
> > >
> > > If `BrokerPublishTime` exists, `BrokerReceiveTime` is not set.
> > > If not, we set `BrokerReceiveTime`  and is no longer affected by client
> > > time.
> > >
> > > On 2024/01/15 02:15:17 PengHui Li wrote:
> > > > IMO, we should enable `AppendBrokerTimestampMetadataInterceptor` by
> > > default.
> > > > Users can still have a way to disable it if they care about the
> > > additional
> > > > metadata stored
> > > > in each entry.
> > > >
> > > > For the `hasBrokerPublishTime` API. The topic might also have historical
> > > > data without
> > > > broker publish time. So, it should be fine to keep this API because we
> > > > don't know how
> > > > long users will retain their data.
> > > >
> > > > Regards,
> > > > Penghui
> > > >
> > > > On Sat, Jan 6, 2024 at 10:35 PM linlin  wrote:
> > > >
> > > > > Now, if the message's metadata does not set a broker side timestamp,
> > > the
> > > > > ledger expiration check is based on the client's publish time.
> > > > >
> > > > > When the client machine's clock is incorrect (eg: set to 1 year later)
> > > ,
> > > > > the ledger can not be cleaned up. Issue
> > > > > https://github.com/apache/pulsar/issues/21347
> > > > >
> > > > > `AppendBrokerTimestampMetadataInterceptor` can set timestamp for
> > > messages
> > > > > on the broker side, but we can not ensure that the
> > > > > `AppendBrokerTimestampMetadataInterceptor` is always enable
> > > > >
> > > > > Therefore, I open this PR(https://github.com/apache/pulsar/pull/21835)
> > > to
> > > > > always set the broker timestamp for messages on the broker side.
> > > > >
> > > > > With this change , firstly we should deprecate
> > > > > AppendBrokerTimestamp

Re: [DISCUSS] Always set a broker side timestamp for message and deprecate some API

2024-01-17 Thread PengHui Li
> We should also improve the TTL:
> When client publish time > Ledger create time + Ledger rollover time,
> and `brokerPublishTime` is not set,
> we can make Ledger TTL time =  Ledger create time + Ledger rollover time.

> This change can make sure entry expires when client clock is not right.

This solution looks good to me.

Regards,
Penghui


On Wed, Jan 17, 2024 at 10:31 PM Yunze Xu  wrote:

> From my perspective, it's a serious security issue. The client side
> could easily make Pulsar's message expiry mechanism not work. Even if
> it's not a malicious change made by users, it would be hard to figure
> out what makes the message not expired.
>
> > Users can still have a way to disable it if they care about the
> additional metadata stored in each entry.
>
> I don't think we should provide such an option for users. Compared
> with the serious security issue, the extra overhead (13 bytes) per
> entry (not per message) should never be a concern.
>
> My suggestion is that the broker entry metadata with the broker side
> timestamp should always be generated and we should not allow users to
> disable it. However, `hasBrokerPublishTime` should not be deprecated.
> The broker entry metadata won't be transferred to the client unless
> `exposingBrokerEntryMetadataToClientEnabled` is true.
>
> Thanks,
> Yunze
>
> On Wed, Jan 17, 2024 at 6:45 PM Lin Lin  wrote:
> >
> > Only enabled `AppendBrokerTimestampMetadataInterceptor` is not enough.
> >
> > We should also improve the TTL:
> > When client publish time > Ledger create time + Ledger rollover time,
> > and `brokerPublishTime` is not set,
> > we can make Ledger TTL time =  Ledger create time + Ledger rollover time.
> >
> > This change can make sure entry expired when client clock is not right.
> >
> > On 2024/01/17 04:31:16 PengHui Li wrote:
> > > > User disabling `AppendBrokerTimestampMetadataInterceptor` does not
> mean
> > > that they allow this bug.
> > > This is a configuration, not an API. It is difficult to use
> documentation
> > > to regulate user behavior.
> > >
> > > Actually, they need to know why they want to disable `
> > > AppendBrokerTimestampMetadataInterceptor`.
> > > Otherwise, why do they want to disable `AppendBrokerTimestampMetadata
> > > Interceptor`?
> > >
> > > Pulsar's default configuration tries to provide a best practice for
> most of
> > > the cases. To avoid the potential
> > > problem, Pulsar enables  `AppendBrokerTimestampMetadataInterceptor` by
> > > default. But it doesn't mean all the cases
> > > should enable `AppendBrokerTimestampMetadataInterceptor`. If users
> don't
> > > use TTL, the producers are
> > > well-maintained. Is there any reason to have at least 16 bytes append
> to
> > > each entry? The default configuration
> > > behavior change also needs to be highlighted in the release note. Users
> > > need to know if they don't disable the
> > > `AppendBrokerTimestampMetadataInterceptor` manually on a newer
> version, the
> > > retained data size will increase.
> > >
> > > BTW, we need to explain each interceptor in the `broker.conf` and why `
> > > AppendBrokerTimestampMetadataInterceptor`
> > > is enabled by default. If users don't want to read it, change anything
> in
> > > `broker.conf` they don't really know what it is.
> > > How can they know what they expect?
> > >
> > > Regards,
> > > Penghui
> > >
> > >
> > > On Mon, Jan 15, 2024 at 11:09 PM Lin Lin  wrote:
> > >
> > > > User disabling `AppendBrokerTimestampMetadataInterceptor` does not
> mean
> > > > that they allow this bug.
> > > > This is a configuration, not an API. It is difficult to use
> documentation
> > > > to regulate user behavior.
> > > >
> > > > Maybe we can add a new field (BrokerTimestamp) to save the timestamp
> on
> > > > the Broker side.
> > > > The time priority for trimming Ledger is as follows:
> > > >
> > > > BrokerPublishTime > BrokerTimestamp
> > > >
> > > > If `BrokerPublishTime` exists, `BrokerReceiveTime` is not set.
> > > > If not, we set `BrokerReceiveTime`  and is no longer affected by
> client
> > > > time.
> > > >
> > > > On 2024/01/15 02:15:17 PengHui Li wrote:
> > > > > IMO, we should enable `AppendBrokerTimestampMetadataInterceptor` by
> > > > default.
> > > > > Users can still have a way to disable it if they care about the
> > > > additional
> > > > > metadata stored
> > > > > in each entry.
> > > > >
> > > > > For the `hasBrokerPublishTime` API. The topic might also have
> historical
> > > > > data without
> > > > > broker publish time. So, it should be fine to keep this API
> because we
> > > > > don't know how
> > > > > long users will retain their data.
> > > > >
> > > > > Regards,
> > > > > Penghui
> > > > >
> > > > > On Sat, Jan 6, 2024 at 10:35 PM linlin  wrote:
> > > > >
> > > > > > Now, if the message's metadata does not set a broker side
> timestamp,
> > > > the
> > > > > > ledger expiration check is based on the client's publish time.
> > > > > >
> > > > > > When the client machine's clock is incorrect (eg: set to 

Re: [DISCUSS] Always set a broker side timestamp for message and deprecate some API

2024-01-17 Thread PengHui Li
> I don't think we should provide such an option for users. Compared
with the serious security issue, the extra overhead (13 bytes) per
entry (not per message) should never be a concern.

It actually changed the stored data format. As far as I know, there are
some users
who read data from the offloaded data files directly. So, at least, we
should provide
a way to allow them to avoid the stored data format change.

And 13 bytes are also related to the use case. If users don't use TTL and
send
too many small messages without batch. The cost will be increased by 10% or
20%.
The issue will not damage the data or cause the data loss.
You can monitor it, to skip the backlogs. TTL is not the only way to evict
backlogs.

Thanks,
Penghui

On Thu, Jan 18, 2024 at 10:33 AM PengHui Li  wrote:

> > We should also improve the TTL:
> > When client publish time > Ledger create time + Ledger rollover time,
> > and `brokerPublishTime` is not set,
> > we can make Ledger TTL time =  Ledger create time + Ledger rollover
> time.
>
> > This change can make sure entry expires when client clock is not right.
>
> This solution looks good to me.
>
> Regards,
> Penghui
>
>
> On Wed, Jan 17, 2024 at 10:31 PM Yunze Xu  wrote:
>
>> From my perspective, it's a serious security issue. The client side
>> could easily make Pulsar's message expiry mechanism not work. Even if
>> it's not a malicious change made by users, it would be hard to figure
>> out what makes the message not expired.
>>
>> > Users can still have a way to disable it if they care about the
>> additional metadata stored in each entry.
>>
>> I don't think we should provide such an option for users. Compared
>> with the serious security issue, the extra overhead (13 bytes) per
>> entry (not per message) should never be a concern.
>>
>> My suggestion is that the broker entry metadata with the broker side
>> timestamp should always be generated and we should not allow users to
>> disable it. However, `hasBrokerPublishTime` should not be deprecated.
>> The broker entry metadata won't be transferred to the client unless
>> `exposingBrokerEntryMetadataToClientEnabled` is true.
>>
>> Thanks,
>> Yunze
>>
>> On Wed, Jan 17, 2024 at 6:45 PM Lin Lin  wrote:
>> >
>> > Only enabled `AppendBrokerTimestampMetadataInterceptor` is not enough.
>> >
>> > We should also improve the TTL:
>> > When client publish time > Ledger create time + Ledger rollover time,
>> > and `brokerPublishTime` is not set,
>> > we can make Ledger TTL time =  Ledger create time + Ledger rollover
>> time.
>> >
>> > This change can make sure entry expired when client clock is not right.
>> >
>> > On 2024/01/17 04:31:16 PengHui Li wrote:
>> > > > User disabling `AppendBrokerTimestampMetadataInterceptor` does not
>> mean
>> > > that they allow this bug.
>> > > This is a configuration, not an API. It is difficult to use
>> documentation
>> > > to regulate user behavior.
>> > >
>> > > Actually, they need to know why they want to disable `
>> > > AppendBrokerTimestampMetadataInterceptor`.
>> > > Otherwise, why do they want to disable `AppendBrokerTimestampMetadata
>> > > Interceptor`?
>> > >
>> > > Pulsar's default configuration tries to provide a best practice for
>> most of
>> > > the cases. To avoid the potential
>> > > problem, Pulsar enables  `AppendBrokerTimestampMetadataInterceptor` by
>> > > default. But it doesn't mean all the cases
>> > > should enable `AppendBrokerTimestampMetadataInterceptor`. If users
>> don't
>> > > use TTL, the producers are
>> > > well-maintained. Is there any reason to have at least 16 bytes append
>> to
>> > > each entry? The default configuration
>> > > behavior change also needs to be highlighted in the release note.
>> Users
>> > > need to know if they don't disable the
>> > > `AppendBrokerTimestampMetadataInterceptor` manually on a newer
>> version, the
>> > > retained data size will increase.
>> > >
>> > > BTW, we need to explain each interceptor in the `broker.conf` and why
>> `
>> > > AppendBrokerTimestampMetadataInterceptor`
>> > > is enabled by default. If users don't want to read it, change
>> anything in
>> > > `broker.conf` they don't really know what it is.
>> > > How can they know what they expect?
>> > >
>> > > Regards,
>> > > Penghui
>> > >
>> > >
>> > > On Mon, Jan 15, 2024 at 11:09 PM Lin Lin  wrote:
>> > >
>> > > > User disabling `AppendBrokerTimestampMetadataInterceptor` does not
>> mean
>> > > > that they allow this bug.
>> > > > This is a configuration, not an API. It is difficult to use
>> documentation
>> > > > to regulate user behavior.
>> > > >
>> > > > Maybe we can add a new field (BrokerTimestamp) to save the
>> timestamp on
>> > > > the Broker side.
>> > > > The time priority for trimming Ledger is as follows:
>> > > >
>> > > > BrokerPublishTime > BrokerTimestamp
>> > > >
>> > > > If `BrokerPublishTime` exists, `BrokerReceiveTime` is not set.
>> > > > If not, we set `BrokerReceiveTime`  and is no longer affected by
>> client
>> > > > time.
>> > > >
>> > > > On 

Re: [VOTE] PIP-330: getMessagesById gets all messages

2024-01-17 Thread PengHui Li
+1 (binding)

Regards,
Penghui

On Wed, Jan 17, 2024 at 8:03 PM Yunze Xu  wrote:

> +1 (binding)
>
> Thanks,
> Yunze
>
> On Wed, Jan 17, 2024 at 6:55 PM guo jiwei  wrote:
> >
> > +1 (binding)
> >
> >
> > Regards
> > Jiwei Guo (Tboy)
> >
> >
> > On Wed, Jan 17, 2024 at 5:39 PM 太上玄元道君  wrote:
> >
> > > +1 nonbinding
> > >
> > >
> > > > 2024年1月17日 17:35,Asaf Mesika  写道:
> > > >
> > > > +1 (non-binding)
> > > >
> > > > On Tue, Jan 16, 2024 at 4:43 AM Dezhi Liu 
> wrote:
> > > >
> > > >> +1 (non-binding)
> > > >>
> > > >> Thanks,
> > > >> Dezhi Liu
> > > >>
> > > >> On 2024/01/15 09:33:48 Zixuan Liu wrote:
> > > >>> Hi Pulsar Community,
> > > >>>
> > > >>> Voting for PIP-330: getMessagesById gets all messages
> > > >>>
> > > >>> PIP: https://github.com/apache/pulsar/pull/21873
> > > >>> Discussion thread:
> > > >>> https://lists.apache.org/thread/vqyh3mvtvovd383sd8zxnlzsspdr863z
> > > >>>
> > > >>> Thanks,
> > > >>> Zixuan
> > > >>>
> > > >>
> > >
> > >
>


Re: [DISCUSS] Always set a broker side timestamp for message and deprecate some API

2024-01-17 Thread Yunze Xu
I see, it only affects the TTL. We can acknowledge the messages to
clean up the data. Then we have to document the `brokerInterceptors`
config clearly to say TTL might not work if
`AppendBrokerTimestampMetadataInterceptor` is not configured.

Thanks,
Yunze

On Thu, Jan 18, 2024 at 10:41 AM PengHui Li  wrote:
>
> > I don't think we should provide such an option for users. Compared
> with the serious security issue, the extra overhead (13 bytes) per
> entry (not per message) should never be a concern.
>
> It actually changed the stored data format. As far as I know, there are
> some users
> who read data from the offloaded data files directly. So, at least, we
> should provide
> a way to allow them to avoid the stored data format change.
>
> And 13 bytes are also related to the use case. If users don't use TTL and
> send
> too many small messages without batch. The cost will be increased by 10% or
> 20%.
> The issue will not damage the data or cause the data loss.
> You can monitor it, to skip the backlogs. TTL is not the only way to evict
> backlogs.
>
> Thanks,
> Penghui
>
> On Thu, Jan 18, 2024 at 10:33 AM PengHui Li  wrote:
>
> > > We should also improve the TTL:
> > > When client publish time > Ledger create time + Ledger rollover time,
> > > and `brokerPublishTime` is not set,
> > > we can make Ledger TTL time =  Ledger create time + Ledger rollover
> > time.
> >
> > > This change can make sure entry expires when client clock is not right.
> >
> > This solution looks good to me.
> >
> > Regards,
> > Penghui
> >
> >
> > On Wed, Jan 17, 2024 at 10:31 PM Yunze Xu  wrote:
> >
> >> From my perspective, it's a serious security issue. The client side
> >> could easily make Pulsar's message expiry mechanism not work. Even if
> >> it's not a malicious change made by users, it would be hard to figure
> >> out what makes the message not expired.
> >>
> >> > Users can still have a way to disable it if they care about the
> >> additional metadata stored in each entry.
> >>
> >> I don't think we should provide such an option for users. Compared
> >> with the serious security issue, the extra overhead (13 bytes) per
> >> entry (not per message) should never be a concern.
> >>
> >> My suggestion is that the broker entry metadata with the broker side
> >> timestamp should always be generated and we should not allow users to
> >> disable it. However, `hasBrokerPublishTime` should not be deprecated.
> >> The broker entry metadata won't be transferred to the client unless
> >> `exposingBrokerEntryMetadataToClientEnabled` is true.
> >>
> >> Thanks,
> >> Yunze
> >>
> >> On Wed, Jan 17, 2024 at 6:45 PM Lin Lin  wrote:
> >> >
> >> > Only enabled `AppendBrokerTimestampMetadataInterceptor` is not enough.
> >> >
> >> > We should also improve the TTL:
> >> > When client publish time > Ledger create time + Ledger rollover time,
> >> > and `brokerPublishTime` is not set,
> >> > we can make Ledger TTL time =  Ledger create time + Ledger rollover
> >> time.
> >> >
> >> > This change can make sure entry expired when client clock is not right.
> >> >
> >> > On 2024/01/17 04:31:16 PengHui Li wrote:
> >> > > > User disabling `AppendBrokerTimestampMetadataInterceptor` does not
> >> mean
> >> > > that they allow this bug.
> >> > > This is a configuration, not an API. It is difficult to use
> >> documentation
> >> > > to regulate user behavior.
> >> > >
> >> > > Actually, they need to know why they want to disable `
> >> > > AppendBrokerTimestampMetadataInterceptor`.
> >> > > Otherwise, why do they want to disable `AppendBrokerTimestampMetadata
> >> > > Interceptor`?
> >> > >
> >> > > Pulsar's default configuration tries to provide a best practice for
> >> most of
> >> > > the cases. To avoid the potential
> >> > > problem, Pulsar enables  `AppendBrokerTimestampMetadataInterceptor` by
> >> > > default. But it doesn't mean all the cases
> >> > > should enable `AppendBrokerTimestampMetadataInterceptor`. If users
> >> don't
> >> > > use TTL, the producers are
> >> > > well-maintained. Is there any reason to have at least 16 bytes append
> >> to
> >> > > each entry? The default configuration
> >> > > behavior change also needs to be highlighted in the release note.
> >> Users
> >> > > need to know if they don't disable the
> >> > > `AppendBrokerTimestampMetadataInterceptor` manually on a newer
> >> version, the
> >> > > retained data size will increase.
> >> > >
> >> > > BTW, we need to explain each interceptor in the `broker.conf` and why
> >> `
> >> > > AppendBrokerTimestampMetadataInterceptor`
> >> > > is enabled by default. If users don't want to read it, change
> >> anything in
> >> > > `broker.conf` they don't really know what it is.
> >> > > How can they know what they expect?
> >> > >
> >> > > Regards,
> >> > > Penghui
> >> > >
> >> > >
> >> > > On Mon, Jan 15, 2024 at 11:09 PM Lin Lin  wrote:
> >> > >
> >> > > > User disabling `AppendBrokerTimestampMetadataInterceptor` does not
> >> mean
> >> > > > that they allow this bug.
> >> > > > 

Re: [VERIFY] Pulsar Release 3.2.0 Candidate 2

2024-01-17 Thread PengHui Li
It looks like something broke the behavior of the querystate from Pulsar
Functions.

1. The querystate exit with `Reason: key 'hello' doesn't exist`, which the
old version will not exit

```
lipenghui@lipenghuis-MacBook-Pro apache-pulsar-3.2.0 % bin/pulsar-admin
functions querystate --tenant test --namespace test-namespace --name
word_count -k hello -w
key 'hello' doesn't exist.

Reason: key 'hello' doesn't exist.
```

2. The output missed the `version` field

```
  "key": "hello",
  "numberValue": 20,
  "version": 19
```

```
  "key": "hello",
  "stringValue": "\u\u\u\u\u\u\u\n",
  "numberValue": 10
```

Regards,
Penghui

On Tue, Jan 16, 2024 at 6:34 PM Zixuan Liu  wrote:

> +1 (non-binding)
>
> - Checked the checksums and signatures
> - Built with Temurin-17.0.6+10
> - Run standalone
> - Checked producer and consumer
>
> Thanks,
> Zixuan
>
> guo jiwei  于2024年1月16日周二 15:57写道:
>
> > This is the second release candidate for Apache Pulsar version 3.2.0.
> >
> > It fixes the following issues:
> > https://github.com/apache/pulsar/milestone/36?closed=1
> >
> > *** Please download, test and verify on this release. This release
> > candidate verification will stay open until Jan 15 ***
> >
> > Note that we are verifying upon the source (tag), binaries are provided
> for
> > convenience.
> >
> > Source and binary files:
> > https://dist.apache.org/repos/dist/dev/pulsar/pulsar-3.2.0-candidate-2/
> >
> > SHA-512 checksums:
> >
> >
> >
> 57af4de0531baada79ff3eb5e51659ae3d2b6732840a27861636d3538d8cca2b3aeb77d3f5cecf37dc5d8a0de11a05d0775ae2d2ca9398f9747fbe662e1cdeae
> >
> > apache-pulsar-3.2.0-bin.tar.gz
> >
> >
> >
> 7ddd8df261f4ffeed2e4f029aa475f523bb9505869870db681a57b7df25af01d5670ba00bc7e062ddae0b962eaed9ee828a4cd7f4744a965afd77e9ae1239d6a
> >
> > apache-pulsar-3.2.0-src.tar.gz
> >
> > Maven staging repo:
> > https://repository.apache.org/content/repositories/orgapachepulsar-1262/
> >
> > The tag to verify:
> > v3.2.0-candidate-2 (5348e1b9124052a454f66769ae3e9f54ee0a75d4)
> > https://github.com/apache/pulsar/commits/v3.2.0-candidate-2/
> >
> > Pulsar's KEYS file containing PGP keys you use to sign the release:
> > https://dist.apache.org/repos/dist/dev/pulsar/KEYS
> >
> > Docker images:
> >
> > pulsar images:
> >
> >
> https://hub.docker.com/layers/technoboy8/pulsar/3.2.0-5348e1b/images/sha256-cf02a08e588ca493d10dbe472b72f5e64e4f90b8b5ae01e91321f336776a5a4e?context=repo
> > <
> >
> https://hub.docker.com/layers/mattison/pulsar/3.1.0-candidate-1/images/sha256-0efbaad7d893cc5041a46a2d4d56432bda855ae4068a38349777d1be6e98d27d?context=explore
> > >
> > pulsar-all images:
> >
> >
> https://hub.docker.com/layers/technoboy8/pulsar-all/3.2.0-5348e1b/images/sha256-801e395fc26257e2beb5ea0574fe27f5a4ff956c85cbcd6b362432d6dda28b95?context=repo
> >
> > Please download the source package, and follow the README to build
> > and run the Pulsar standalone service.
> >
> > Note that this RC doesn't require a formal vote, but we would also
> > appreciate your feedback with +1/-1. And please provide specific
> > comments if your feedback is not +1.
> >
> >
> > Regards
> > Jiwei Guo (Tboy)
> >
>


Re: [DISCUSS] Always set a broker side timestamp for message and deprecate some API

2024-01-17 Thread PengHui Li
> I see, it only affects the TTL. We can acknowledge the messages to
clean up the data. Then we have to document the `brokerInterceptors`
config clearly to say TTL might not work if
`AppendBrokerTimestampMetadataInterceptor` is not configured.

Yes, and LinLin figured out a better solution to mitigate this issue.

> We should also improve the TTL:
> When client publish time > Ledger create time + Ledger rollover time,
> and `brokerPublishTime` is not set,
> we can make Ledger TTL time =  Ledger create time + Ledger rollover time.

BTW, I think we can use the timestamp in the Ledger Info directly. It's not
the ledger
creation time, it should be the close timestamp.

Regards,
Penghui


On Thu, Jan 18, 2024 at 11:25 AM Yunze Xu  wrote:

> I see, it only affects the TTL. We can acknowledge the messages to
> clean up the data. Then we have to document the `brokerInterceptors`
> config clearly to say TTL might not work if
> `AppendBrokerTimestampMetadataInterceptor` is not configured.
>
> Thanks,
> Yunze
>
> On Thu, Jan 18, 2024 at 10:41 AM PengHui Li  wrote:
> >
> > > I don't think we should provide such an option for users. Compared
> > with the serious security issue, the extra overhead (13 bytes) per
> > entry (not per message) should never be a concern.
> >
> > It actually changed the stored data format. As far as I know, there are
> > some users
> > who read data from the offloaded data files directly. So, at least, we
> > should provide
> > a way to allow them to avoid the stored data format change.
> >
> > And 13 bytes are also related to the use case. If users don't use TTL and
> > send
> > too many small messages without batch. The cost will be increased by 10%
> or
> > 20%.
> > The issue will not damage the data or cause the data loss.
> > You can monitor it, to skip the backlogs. TTL is not the only way to
> evict
> > backlogs.
> >
> > Thanks,
> > Penghui
> >
> > On Thu, Jan 18, 2024 at 10:33 AM PengHui Li  wrote:
> >
> > > > We should also improve the TTL:
> > > > When client publish time > Ledger create time + Ledger rollover time,
> > > > and `brokerPublishTime` is not set,
> > > > we can make Ledger TTL time =  Ledger create time + Ledger rollover
> > > time.
> > >
> > > > This change can make sure entry expires when client clock is not
> right.
> > >
> > > This solution looks good to me.
> > >
> > > Regards,
> > > Penghui
> > >
> > >
> > > On Wed, Jan 17, 2024 at 10:31 PM Yunze Xu  wrote:
> > >
> > >> From my perspective, it's a serious security issue. The client side
> > >> could easily make Pulsar's message expiry mechanism not work. Even if
> > >> it's not a malicious change made by users, it would be hard to figure
> > >> out what makes the message not expired.
> > >>
> > >> > Users can still have a way to disable it if they care about the
> > >> additional metadata stored in each entry.
> > >>
> > >> I don't think we should provide such an option for users. Compared
> > >> with the serious security issue, the extra overhead (13 bytes) per
> > >> entry (not per message) should never be a concern.
> > >>
> > >> My suggestion is that the broker entry metadata with the broker side
> > >> timestamp should always be generated and we should not allow users to
> > >> disable it. However, `hasBrokerPublishTime` should not be deprecated.
> > >> The broker entry metadata won't be transferred to the client unless
> > >> `exposingBrokerEntryMetadataToClientEnabled` is true.
> > >>
> > >> Thanks,
> > >> Yunze
> > >>
> > >> On Wed, Jan 17, 2024 at 6:45 PM Lin Lin  wrote:
> > >> >
> > >> > Only enabled `AppendBrokerTimestampMetadataInterceptor` is not
> enough.
> > >> >
> > >> > We should also improve the TTL:
> > >> > When client publish time > Ledger create time + Ledger rollover
> time,
> > >> > and `brokerPublishTime` is not set,
> > >> > we can make Ledger TTL time =  Ledger create time + Ledger rollover
> > >> time.
> > >> >
> > >> > This change can make sure entry expired when client clock is not
> right.
> > >> >
> > >> > On 2024/01/17 04:31:16 PengHui Li wrote:
> > >> > > > User disabling `AppendBrokerTimestampMetadataInterceptor` does
> not
> > >> mean
> > >> > > that they allow this bug.
> > >> > > This is a configuration, not an API. It is difficult to use
> > >> documentation
> > >> > > to regulate user behavior.
> > >> > >
> > >> > > Actually, they need to know why they want to disable `
> > >> > > AppendBrokerTimestampMetadataInterceptor`.
> > >> > > Otherwise, why do they want to disable
> `AppendBrokerTimestampMetadata
> > >> > > Interceptor`?
> > >> > >
> > >> > > Pulsar's default configuration tries to provide a best practice
> for
> > >> most of
> > >> > > the cases. To avoid the potential
> > >> > > problem, Pulsar enables
> `AppendBrokerTimestampMetadataInterceptor` by
> > >> > > default. But it doesn't mean all the cases
> > >> > > should enable `AppendBrokerTimestampMetadataInterceptor`. If users
> > >> don't
> > >> > > use TTL, the producers are
> > >> > > well-maintained. Is there

Re: [VOTE] PIP-330: getMessagesById gets all messages

2024-01-17 Thread Zixuan Liu
This PIP has been approved by 3 +1 (binding):

- Jiwei
- Yunze
- Penghui

Thanks,
Zixuan

Zixuan Liu  于2024年1月15日周一 17:33写道:

> Hi Pulsar Community,
>
> Voting for PIP-330: getMessagesById gets all messages
>
> PIP: https://github.com/apache/pulsar/pull/21873
> Discussion thread:
> https://lists.apache.org/thread/vqyh3mvtvovd383sd8zxnlzsspdr863z
>
> Thanks,
> Zixuan
>


[VOTE] Release Apache Pulsar Helm Chart 3.2.0 based on 3.2.0-candidate-2

2024-01-17 Thread Lari Hotari
Hello Apache Pulsar Community,

This is a call for the vote to release the Apache Pulsar Helm Chart version 
3.2.0.

Release notes for 3.2.0-candidate-2:
https://github.com/apache/pulsar-helm-chart/releases/tag/pulsar-3.2.0-candidate-2

The release candidate is available at:
https://dist.apache.org/repos/dist/dev/pulsar/helm-chart/3.2.0-candidate-2/

pulsar-chart-3.2.0-source.tar.gz - is the "main source release".
pulsar-3.2.0.tgz - is the binary Helm Chart release.

Public keys are available at: https://www.apache.org/dist/pulsar/KEYS

For convenience "index.yaml" has been uploaded (though excluded from voting), 
so you can also run the below commands.

helm repo add --force-update apache-pulsar-dist-dev 
https://dist.apache.org/repos/dist/dev/pulsar/helm-chart/3.2.0-candidate-2/
helm repo update
helm install pulsar apache-pulsar-dist-dev/pulsar --version 3.2.0 --set 
affinity.anti_affinity=false

pulsar-3.2.0.tgz.prov - is also uploaded for verifying Chart Integrity, though 
it is not strictly required for releasing the artifact based on ASF Guidelines. 

You can optionally verify this file using this helm plugin 
https://github.com/technosophos/helm-gpg, or by using helm --verify 
(https://helm.sh/docs/helm/helm_verify/).

helm fetch --prov apache-pulsar-dist-dev/pulsar
helm plugin install https://github.com/technosophos/helm-gpg
helm gpg verify pulsar-3.2.0.tgz

The vote will be open for at least 72 hours.

Only votes from PMC members are binding, but members of the community are
encouraged to test the release and vote with "(non-binding)".

For license checks, the .rat-excludes files is included, so you can run the 
following to verify licenses (just update ):

tar -xvf pulsar-chart-3.2.0-source.tar.gz
cd pulsar-chart-3.2.0
java -jar /apache-rat-0.15/apache-rat-0.15.jar . -E .rat-excludes

Please note that the version number excludes the `-candidate-X` string, so it's 
now
simply 3.2.0. This will allow us to rename the artifact without modifying
the artifact checksums when we actually release it.

Thanks,
Lari


Re: [VOTE] Release Apache Pulsar Helm Chart 3.2.0 based on 3.2.0-candidate-1

2024-01-17 Thread Lari Hotari
This release vote is cancelled since there's a new vote out for Apache Pulsar 
Helm Chart 3.2.0-candidate-2:
https://lists.apache.org/thread/f0zn9882wmkwcm768v1h38ys6z5qncto

-Lari

On 2024/01/17 10:07:32 Lari Hotari wrote:
> Hello Apache Pulsar Community,
> 
> This is a call for the vote to release the Apache Pulsar Helm Chart version 
> 3.2.0.
> 
> Release notes for 3.2.0-candidate-1:
> https://github.com/apache/pulsar-helm-chart/releases/tag/pulsar-3.2.0-candidate-1
> 
> The release candidate is available at:
> https://dist.apache.org/repos/dist/dev/pulsar/helm-chart/3.2.0-candidate-1/
> 
> pulsar-chart-3.2.0-source.tar.gz - is the "main source release".
> pulsar-3.2.0.tgz - is the binary Helm Chart release.
> 
> Public keys are available at: https://www.apache.org/dist/pulsar/KEYS
> 
> For convenience "index.yaml" has been uploaded (though excluded from voting), 
> so you can also run the below commands.
> 
> helm repo add --force-update apache-pulsar-dist-dev 
> https://dist.apache.org/repos/dist/dev/pulsar/helm-chart/3.2.0-candidate-1/
> helm repo update
> helm install pulsar apache-pulsar-dist-dev/pulsar --version 3.2.0 --set 
> affinity.anti_affinity=false
> 
> pulsar-3.2.0.tgz.prov - is also uploaded for verifying Chart Integrity, 
> though it is not strictly required for releasing the artifact based on ASF 
> Guidelines. 
> 
> You can optionally verify this file using this helm plugin 
> https://github.com/technosophos/helm-gpg, or by using helm --verify 
> (https://helm.sh/docs/helm/helm_verify/).
> 
> helm fetch --prov apache-pulsar-dist-dev/pulsar
> helm plugin install https://github.com/technosophos/helm-gpg
> helm gpg verify pulsar-3.2.0.tgz
> 
> The vote will be open for at least 72 hours.
> 
> Only votes from PMC members are binding, but members of the community are
> encouraged to test the release and vote with "(non-binding)".
> 
> For license checks, the .rat-excludes files is included, so you can run the 
> following to verify licenses (just update ):
> 
> tar -xvf pulsar-chart-3.2.0-source.tar.gz
> cd pulsar-chart-3.2.0
> java -jar /apache-rat-0.15/apache-rat-0.15.jar . -E .rat-excludes
> 
> Please note that the version number excludes the `-candidate-X` string, so 
> it's now
> simply 3.2.0. This will allow us to rename the artifact without modifying
> the artifact checksums when we actually release it.
> 
> Thanks,
> Lari
>