> I don't think we should provide such an option for users. Compared
with the serious security issue, the extra overhead (13 bytes) per
entry (not per message) should never be a concern.

It actually changed the stored data format. As far as I know, there are
some users
who read data from the offloaded data files directly. So, at least, we
should provide
a way to allow them to avoid the stored data format change.

And 13 bytes are also related to the use case. If users don't use TTL and
send
too many small messages without batch. The cost will be increased by 10% or
20%.
The issue will not damage the data or cause the data loss.
You can monitor it, to skip the backlogs. TTL is not the only way to evict
backlogs.

Thanks,
Penghui

On Thu, Jan 18, 2024 at 10:33 AM PengHui Li <peng...@apache.org> wrote:

> > We should also improve the TTL:
> > When client publish time > Ledger create time + Ledger rollover time,
> > and `brokerPublishTime` is not set,
> > we can make Ledger TTL time =  Ledger create time + Ledger rollover
> time.
>
> > This change can make sure entry expires when client clock is not right.
>
> This solution looks good to me.
>
> Regards,
> Penghui
>
>
> On Wed, Jan 17, 2024 at 10:31 PM Yunze Xu <x...@apache.org> wrote:
>
>> From my perspective, it's a serious security issue. The client side
>> could easily make Pulsar's message expiry mechanism not work. Even if
>> it's not a malicious change made by users, it would be hard to figure
>> out what makes the message not expired.
>>
>> > Users can still have a way to disable it if they care about the
>> additional metadata stored in each entry.
>>
>> I don't think we should provide such an option for users. Compared
>> with the serious security issue, the extra overhead (13 bytes) per
>> entry (not per message) should never be a concern.
>>
>> My suggestion is that the broker entry metadata with the broker side
>> timestamp should always be generated and we should not allow users to
>> disable it. However, `hasBrokerPublishTime` should not be deprecated.
>> The broker entry metadata won't be transferred to the client unless
>> `exposingBrokerEntryMetadataToClientEnabled` is true.
>>
>> Thanks,
>> Yunze
>>
>> On Wed, Jan 17, 2024 at 6:45 PM Lin Lin <lin...@apache.org> wrote:
>> >
>> > Only enabled `AppendBrokerTimestampMetadataInterceptor` is not enough.
>> >
>> > We should also improve the TTL:
>> > When client publish time > Ledger create time + Ledger rollover time,
>> > and `brokerPublishTime` is not set,
>> > we can make Ledger TTL time =  Ledger create time + Ledger rollover
>> time.
>> >
>> > This change can make sure entry expired when client clock is not right.
>> >
>> > On 2024/01/17 04:31:16 PengHui Li wrote:
>> > > > User disabling `AppendBrokerTimestampMetadataInterceptor` does not
>> mean
>> > > that they allow this bug.
>> > > This is a configuration, not an API. It is difficult to use
>> documentation
>> > > to regulate user behavior.
>> > >
>> > > Actually, they need to know why they want to disable `
>> > > AppendBrokerTimestampMetadataInterceptor`.
>> > > Otherwise, why do they want to disable `AppendBrokerTimestampMetadata
>> > > Interceptor`?
>> > >
>> > > Pulsar's default configuration tries to provide a best practice for
>> most of
>> > > the cases. To avoid the potential
>> > > problem, Pulsar enables  `AppendBrokerTimestampMetadataInterceptor` by
>> > > default. But it doesn't mean all the cases
>> > > should enable `AppendBrokerTimestampMetadataInterceptor`. If users
>> don't
>> > > use TTL, the producers are
>> > > well-maintained. Is there any reason to have at least 16 bytes append
>> to
>> > > each entry? The default configuration
>> > > behavior change also needs to be highlighted in the release note.
>> Users
>> > > need to know if they don't disable the
>> > > `AppendBrokerTimestampMetadataInterceptor` manually on a newer
>> version, the
>> > > retained data size will increase.
>> > >
>> > > BTW, we need to explain each interceptor in the `broker.conf` and why
>> `
>> > > AppendBrokerTimestampMetadataInterceptor`
>> > > is enabled by default. If users don't want to read it, change
>> anything in
>> > > `broker.conf` they don't really know what it is.
>> > > How can they know what they expect?
>> > >
>> > > Regards,
>> > > Penghui
>> > >
>> > >
>> > > On Mon, Jan 15, 2024 at 11:09 PM Lin Lin <lin...@apache.org> wrote:
>> > >
>> > > > User disabling `AppendBrokerTimestampMetadataInterceptor` does not
>> mean
>> > > > that they allow this bug.
>> > > > This is a configuration, not an API. It is difficult to use
>> documentation
>> > > > to regulate user behavior.
>> > > >
>> > > > Maybe we can add a new field (BrokerTimestamp) to save the
>> timestamp on
>> > > > the Broker side.
>> > > > The time priority for trimming Ledger is as follows:
>> > > >
>> > > > BrokerPublishTime > BrokerTimestamp
>> > > >
>> > > > If `BrokerPublishTime` exists, `BrokerReceiveTime` is not set.
>> > > > If not, we set `BrokerReceiveTime`  and is no longer affected by
>> client
>> > > > time.
>> > > >
>> > > > On 2024/01/15 02:15:17 PengHui Li wrote:
>> > > > > IMO, we should enable `AppendBrokerTimestampMetadataInterceptor`
>> by
>> > > > default.
>> > > > > Users can still have a way to disable it if they care about the
>> > > > additional
>> > > > > metadata stored
>> > > > > in each entry.
>> > > > >
>> > > > > For the `hasBrokerPublishTime` API. The topic might also have
>> historical
>> > > > > data without
>> > > > > broker publish time. So, it should be fine to keep this API
>> because we
>> > > > > don't know how
>> > > > > long users will retain their data.
>> > > > >
>> > > > > Regards,
>> > > > > Penghui
>> > > > >
>> > > > > On Sat, Jan 6, 2024 at 10:35 PM linlin <lin...@apache.org> wrote:
>> > > > >
>> > > > > > Now, if the message's metadata does not set a broker side
>> timestamp,
>> > > > the
>> > > > > > ledger expiration check is based on the client's publish time.
>> > > > > >
>> > > > > > When the client machine's clock is incorrect (eg: set to 1 year
>> later)
>> > > > ,
>> > > > > > the ledger can not be cleaned up. Issue
>> > > > > > https://github.com/apache/pulsar/issues/21347
>> > > > > >
>> > > > > > `AppendBrokerTimestampMetadataInterceptor` can set timestamp for
>> > > > messages
>> > > > > > on the broker side, but we can not ensure that the
>> > > > > > `AppendBrokerTimestampMetadataInterceptor` is always enable
>> > > > > >
>> > > > > > Therefore, I open this PR(
>> https://github.com/apache/pulsar/pull/21835)
>> > > > to
>> > > > > > always set the broker timestamp for messages on the broker side.
>> > > > > >
>> > > > > > With this change , firstly we should deprecate
>> > > > > > AppendBrokerTimestampMetadataInterceptor.
>> > > > > > It no longer needs to exist
>> > > > > >
>> > > > > > Secondly, we should deprecate `hasBrokerPublishTime` in
>> interface
>> > > > Message.
>> > > > > > It always returns true.
>> > > > > > This API is created in PR (
>> https://github.com/apache/pulsar/pull/11553
>> > > > )
>> > > > > > This PR is for the client to obtain BrokerPublishTime, so the
>> > > > > > `hasBrokerPublishTime` API is not necessary.
>> > > > > >
>> > > > >
>> > > >
>> > >
>>
>

Reply via email to