I see, it only affects the TTL. We can acknowledge the messages to clean up the data. Then we have to document the `brokerInterceptors` config clearly to say TTL might not work if `AppendBrokerTimestampMetadataInterceptor` is not configured.
Thanks, Yunze On Thu, Jan 18, 2024 at 10:41 AM PengHui Li <peng...@apache.org> wrote: > > > I don't think we should provide such an option for users. Compared > with the serious security issue, the extra overhead (13 bytes) per > entry (not per message) should never be a concern. > > It actually changed the stored data format. As far as I know, there are > some users > who read data from the offloaded data files directly. So, at least, we > should provide > a way to allow them to avoid the stored data format change. > > And 13 bytes are also related to the use case. If users don't use TTL and > send > too many small messages without batch. The cost will be increased by 10% or > 20%. > The issue will not damage the data or cause the data loss. > You can monitor it, to skip the backlogs. TTL is not the only way to evict > backlogs. > > Thanks, > Penghui > > On Thu, Jan 18, 2024 at 10:33 AM PengHui Li <peng...@apache.org> wrote: > > > > We should also improve the TTL: > > > When client publish time > Ledger create time + Ledger rollover time, > > > and `brokerPublishTime` is not set, > > > we can make Ledger TTL time = Ledger create time + Ledger rollover > > time. > > > > > This change can make sure entry expires when client clock is not right. > > > > This solution looks good to me. > > > > Regards, > > Penghui > > > > > > On Wed, Jan 17, 2024 at 10:31 PM Yunze Xu <x...@apache.org> wrote: > > > >> From my perspective, it's a serious security issue. The client side > >> could easily make Pulsar's message expiry mechanism not work. Even if > >> it's not a malicious change made by users, it would be hard to figure > >> out what makes the message not expired. > >> > >> > Users can still have a way to disable it if they care about the > >> additional metadata stored in each entry. > >> > >> I don't think we should provide such an option for users. Compared > >> with the serious security issue, the extra overhead (13 bytes) per > >> entry (not per message) should never be a concern. > >> > >> My suggestion is that the broker entry metadata with the broker side > >> timestamp should always be generated and we should not allow users to > >> disable it. However, `hasBrokerPublishTime` should not be deprecated. > >> The broker entry metadata won't be transferred to the client unless > >> `exposingBrokerEntryMetadataToClientEnabled` is true. > >> > >> Thanks, > >> Yunze > >> > >> On Wed, Jan 17, 2024 at 6:45 PM Lin Lin <lin...@apache.org> wrote: > >> > > >> > Only enabled `AppendBrokerTimestampMetadataInterceptor` is not enough. > >> > > >> > We should also improve the TTL: > >> > When client publish time > Ledger create time + Ledger rollover time, > >> > and `brokerPublishTime` is not set, > >> > we can make Ledger TTL time = Ledger create time + Ledger rollover > >> time. > >> > > >> > This change can make sure entry expired when client clock is not right. > >> > > >> > On 2024/01/17 04:31:16 PengHui Li wrote: > >> > > > User disabling `AppendBrokerTimestampMetadataInterceptor` does not > >> mean > >> > > that they allow this bug. > >> > > This is a configuration, not an API. It is difficult to use > >> documentation > >> > > to regulate user behavior. > >> > > > >> > > Actually, they need to know why they want to disable ` > >> > > AppendBrokerTimestampMetadataInterceptor`. > >> > > Otherwise, why do they want to disable `AppendBrokerTimestampMetadata > >> > > Interceptor`? > >> > > > >> > > Pulsar's default configuration tries to provide a best practice for > >> most of > >> > > the cases. To avoid the potential > >> > > problem, Pulsar enables `AppendBrokerTimestampMetadataInterceptor` by > >> > > default. But it doesn't mean all the cases > >> > > should enable `AppendBrokerTimestampMetadataInterceptor`. If users > >> don't > >> > > use TTL, the producers are > >> > > well-maintained. Is there any reason to have at least 16 bytes append > >> to > >> > > each entry? The default configuration > >> > > behavior change also needs to be highlighted in the release note. > >> Users > >> > > need to know if they don't disable the > >> > > `AppendBrokerTimestampMetadataInterceptor` manually on a newer > >> version, the > >> > > retained data size will increase. > >> > > > >> > > BTW, we need to explain each interceptor in the `broker.conf` and why > >> ` > >> > > AppendBrokerTimestampMetadataInterceptor` > >> > > is enabled by default. If users don't want to read it, change > >> anything in > >> > > `broker.conf` they don't really know what it is. > >> > > How can they know what they expect? > >> > > > >> > > Regards, > >> > > Penghui > >> > > > >> > > > >> > > On Mon, Jan 15, 2024 at 11:09 PM Lin Lin <lin...@apache.org> wrote: > >> > > > >> > > > User disabling `AppendBrokerTimestampMetadataInterceptor` does not > >> mean > >> > > > that they allow this bug. > >> > > > This is a configuration, not an API. It is difficult to use > >> documentation > >> > > > to regulate user behavior. > >> > > > > >> > > > Maybe we can add a new field (BrokerTimestamp) to save the > >> timestamp on > >> > > > the Broker side. > >> > > > The time priority for trimming Ledger is as follows: > >> > > > > >> > > > BrokerPublishTime > BrokerTimestamp > >> > > > > >> > > > If `BrokerPublishTime` exists, `BrokerReceiveTime` is not set. > >> > > > If not, we set `BrokerReceiveTime` and is no longer affected by > >> client > >> > > > time. > >> > > > > >> > > > On 2024/01/15 02:15:17 PengHui Li wrote: > >> > > > > IMO, we should enable `AppendBrokerTimestampMetadataInterceptor` > >> by > >> > > > default. > >> > > > > Users can still have a way to disable it if they care about the > >> > > > additional > >> > > > > metadata stored > >> > > > > in each entry. > >> > > > > > >> > > > > For the `hasBrokerPublishTime` API. The topic might also have > >> historical > >> > > > > data without > >> > > > > broker publish time. So, it should be fine to keep this API > >> because we > >> > > > > don't know how > >> > > > > long users will retain their data. > >> > > > > > >> > > > > Regards, > >> > > > > Penghui > >> > > > > > >> > > > > On Sat, Jan 6, 2024 at 10:35 PM linlin <lin...@apache.org> wrote: > >> > > > > > >> > > > > > Now, if the message's metadata does not set a broker side > >> timestamp, > >> > > > the > >> > > > > > ledger expiration check is based on the client's publish time. > >> > > > > > > >> > > > > > When the client machine's clock is incorrect (eg: set to 1 year > >> later) > >> > > > , > >> > > > > > the ledger can not be cleaned up. Issue > >> > > > > > https://github.com/apache/pulsar/issues/21347 > >> > > > > > > >> > > > > > `AppendBrokerTimestampMetadataInterceptor` can set timestamp for > >> > > > messages > >> > > > > > on the broker side, but we can not ensure that the > >> > > > > > `AppendBrokerTimestampMetadataInterceptor` is always enable > >> > > > > > > >> > > > > > Therefore, I open this PR( > >> https://github.com/apache/pulsar/pull/21835) > >> > > > to > >> > > > > > always set the broker timestamp for messages on the broker side. > >> > > > > > > >> > > > > > With this change , firstly we should deprecate > >> > > > > > AppendBrokerTimestampMetadataInterceptor. > >> > > > > > It no longer needs to exist > >> > > > > > > >> > > > > > Secondly, we should deprecate `hasBrokerPublishTime` in > >> interface > >> > > > Message. > >> > > > > > It always returns true. > >> > > > > > This API is created in PR ( > >> https://github.com/apache/pulsar/pull/11553 > >> > > > ) > >> > > > > > This PR is for the client to obtain BrokerPublishTime, so the > >> > > > > > `hasBrokerPublishTime` API is not necessary. > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > >