Re: [DISCUSS] Remove the limitation between backlog quota and retention policy

2024-04-10 Thread Yike Xiao
Here's a hint as to why this restriction was introduced.
After reviewing the current implementation of the retention policy, I found 
that the logic for determining whether a ledger exceeds the retention size and 
should be deleted is based on the following logic (pseudocode):

```
long totalSizeToDelete = 0;
for (ledger in ledgersBeforeSlowestReaderLedgerId) {
  totalSizeToDelete += ledger.size;
  if (managedLedgerTotalSize - totalSizeToDelete >= retentionSize) {
// add this ledger to ledgersToDelete list
  }
}
```

The actual code can be seen at: [1]

The issue lies in the use of `managedLedgerTotalSize - totalSizeToDelete >= 
retentionSize`, where `managedLedgerTotalSize` is used instead of 
`SUM(ledgerSize(ledgersBeforeSlowestReaderLedgerId))`. This causes ledgers that 
have not been acknowledged to also be included in the retention size comparison.

Here is an example to illustrate the problem:
Suppose the current ManagedLedger has 5 ledgers, [1, 2, 3, 4, 5], each ledger's 
size is 1MB, and the retentionSize is set to 2MB. The current 
slowestReaderLedgerId is 3. Based on the current implementation, ledgers 1 and 
2 are expected to be retained but will be deleted.
The retention policy will only actually take effect when the retentionSize 
exceeds the backlog quota.

[1]: 
https://github.com/apache/pulsar/blob/master/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedLedgerImpl.java#L2695-L2696

Regards,
Yike

From: Yike Xiao 
Sent: Wednesday, April 3, 2024 15:36
To: dev@pulsar.apache.org 
Subject: [DISCUSS] Remove the limitation between backlog quota and retention 
policy

Hi all,

In Pulsar, backlogs consist of unacknowledged messages. Once a message is 
acknowledged by all subscriptions, it enters the retention stage, as clearly 
illustrated in the official documentation [1]. The backlog quota limits the 
size and/or time threshold of the backlog, while the retention policy dictates 
the duration or volume of messages (acknowledged by all subscriptions) to 
retain.

However, I encountered an undocumented restriction that mandates the backlog 
quota to be greater than the retention policy. For example, a namespace with a 
backlog quota 20GB in size and unlimited in time cannot have a 2GB in size and 
7 days in time retention policy. This detail is implemented in the code [3].

This restriction poses challenges in scenarios where there's a need for a 
larger backlog quota to accommodate extensive usage while desiring a shorter 
retention period for traceback purposes. Such configurations seem unattainable 
under the current constraint.

Given that this limitation has been part of Pulsar since its initial 
open-source release, I am curious about the rationale behind it. Understanding 
the original intent or the technical considerations that led to this 
restriction would be immensely helpful.

Therefore, I propose reevaluating and potentially removing this restriction to 
allow for more flexible configurations of backlog quota and retention policy.

I would greatly appreciate your thoughts on this matter.

Thank you for your time and consideration. I look forward to your response and 
am eager to contribute further to this discussion.

[1]: 
https://jpn01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpulsar.apache.org%2Fdocs%2Fnext%2Fcookbooks-retention-expiry%2F%23retention-policies&data=05%7C02%7C%7Ccb059c912d5a4f98327308dc53b0ded0%7C84df9e7fe9f640afb435%7C1%7C0%7C638477266343295533%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=wRjCYXkwGq4TAhYet8FdxkpnaehpCay7ST1AZi2jKAo%3D&reserved=0
[3]: 
https://jpn01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fpulsar%2Fblob%2F7315aeb6258b7adc9d874268d50acb95ffc0cf2b%2Fpulsar-broker%2Fsrc%2Fmain%2Fjava%2Forg%2Fapache%2Fpulsar%2Fbroker%2Fadmin%2FAdminResource.java%23L374-L393&data=05%7C02%7C%7Ccb059c912d5a4f98327308dc53b0ded0%7C84df9e7fe9f640afb435%7C1%7C0%7C638477266343313621%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=JmKZUCH%2F9dD07pGIknGpOnTfGUNmaDAn3bM7f%2FirgKg%3D&reserved=0

Regards,
Yike


Re: [DISCUSS] Remove the limitation between backlog quota and retention policy

2024-04-10 Thread 太上玄元道君
the limitation seems make no sense, after
http://github.com/apache/pulsar/pull/22472 merged, I'll try to fix it.

Thanks,
Tao Jiuming

Yike Xiao 于2024年4月10日 周三16:58写道:

> Here's a hint as to why this restriction was introduced.
> After reviewing the current implementation of the retention policy, I
> found that the logic for determining whether a ledger exceeds the retention
> size and should be deleted is based on the following logic (pseudocode):
>
> ```
> long totalSizeToDelete = 0;
> for (ledger in ledgersBeforeSlowestReaderLedgerId) {
>   totalSizeToDelete += ledger.size;
>   if (managedLedgerTotalSize - totalSizeToDelete >= retentionSize) {
> // add this ledger to ledgersToDelete list
>   }
> }
> ```
>
> The actual code can be seen at: [1]
>
> The issue lies in the use of `managedLedgerTotalSize - totalSizeToDelete
> >= retentionSize`, where `managedLedgerTotalSize` is used instead of
> `SUM(ledgerSize(ledgersBeforeSlowestReaderLedgerId))`. This causes ledgers
> that have not been acknowledged to also be included in the retention size
> comparison.
>
> Here is an example to illustrate the problem:
> Suppose the current ManagedLedger has 5 ledgers, [1, 2, 3, 4, 5], each
> ledger's size is 1MB, and the retentionSize is set to 2MB. The current
> slowestReaderLedgerId is 3. Based on the current implementation, ledgers 1
> and 2 are expected to be retained but will be deleted.
> The retention policy will only actually take effect when the retentionSize
> exceeds the backlog quota.
>
> [1]:
> https://github.com/apache/pulsar/blob/master/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedLedgerImpl.java#L2695-L2696
>
> Regards,
> Yike
> 
> From: Yike Xiao 
> Sent: Wednesday, April 3, 2024 15:36
> To: dev@pulsar.apache.org 
> Subject: [DISCUSS] Remove the limitation between backlog quota and
> retention policy
>
> Hi all,
>
> In Pulsar, backlogs consist of unacknowledged messages. Once a message is
> acknowledged by all subscriptions, it enters the retention stage, as
> clearly illustrated in the official documentation [1]. The backlog quota
> limits the size and/or time threshold of the backlog, while the retention
> policy dictates the duration or volume of messages (acknowledged by all
> subscriptions) to retain.
>
> However, I encountered an undocumented restriction that mandates the
> backlog quota to be greater than the retention policy. For example, a
> namespace with a backlog quota 20GB in size and unlimited in time cannot
> have a 2GB in size and 7 days in time retention policy. This detail is
> implemented in the code [3].
>
> This restriction poses challenges in scenarios where there's a need for a
> larger backlog quota to accommodate extensive usage while desiring a
> shorter retention period for traceback purposes. Such configurations seem
> unattainable under the current constraint.
>
> Given that this limitation has been part of Pulsar since its initial
> open-source release, I am curious about the rationale behind it.
> Understanding the original intent or the technical considerations that led
> to this restriction would be immensely helpful.
>
> Therefore, I propose reevaluating and potentially removing this
> restriction to allow for more flexible configurations of backlog quota and
> retention policy.
>
> I would greatly appreciate your thoughts on this matter.
>
> Thank you for your time and consideration. I look forward to your response
> and am eager to contribute further to this discussion.
>
> [1]:
> https://jpn01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpulsar.apache.org%2Fdocs%2Fnext%2Fcookbooks-retention-expiry%2F%23retention-policies&data=05%7C02%7C%7Ccb059c912d5a4f98327308dc53b0ded0%7C84df9e7fe9f640afb435%7C1%7C0%7C638477266343295533%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=wRjCYXkwGq4TAhYet8FdxkpnaehpCay7ST1AZi2jKAo%3D&reserved=0
> <
> https://pulsar.apache.org/docs/next/cookbooks-retention-expiry/#retention-policies
> >
> [3]:
> https://jpn01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fpulsar%2Fblob%2F7315aeb6258b7adc9d874268d50acb95ffc0cf2b%2Fpulsar-broker%2Fsrc%2Fmain%2Fjava%2Forg%2Fapache%2Fpulsar%2Fbroker%2Fadmin%2FAdminResource.java%23L374-L393&data=05%7C02%7C%7Ccb059c912d5a4f98327308dc53b0ded0%7C84df9e7fe9f640afb435%7C1%7C0%7C638477266343313621%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=JmKZUCH%2F9dD07pGIknGpOnTfGUNmaDAn3bM7f%2FirgKg%3D&reserved=0
> <
> https://github.com/apache/pulsar/blob/7315aeb6258b7adc9d874268d50acb95ffc0cf2b/pulsar-broker/src/main/java/org/apache/pulsar/broker/admin/AdminResource.java#L374-L393
> >
>
> Regards,
> Yike
>


RE: [DISCUSSION] Proposal to Replace Pulsar's Homegrown Configuration Framework with Gestalt Config

2024-04-10 Thread Colin Redmond

Hello,

I am the author of the Gestalt library and I am honored that you are 
considering leveraging my library for your project.


There are a lot of really great configuration libraries out there to 
chose from, and I see you have already done your to do your due 
diligence in reviewing many (i would recommend against Lightbend, as it 
has been put into maintenance mode and does not support newer features 
liked java records).


I think my zero dependency configuration library that is modular, 
extensible and has a large number of features and functionality would be 
a great fit for your project. It is great you already have a clear idea 
of how to leverage the library to decouple your configs and increase 
your security posture.


I am usually quick to reply to issues or bugs and I am open to pull 
requests. If i can help answer any questions feel free to reach out.



Thanks,

Colin Redmond


On 2024/02/21 09:26:19 Lari Hotari wrote:
> Hello everyone,
>
> I would like to bring up an issue with Pulsar's containers, specifically
> regarding the method of overriding configurations. For instance, the
> Apache Pulsar Helm chart employs "bin/apply-config-from-env.py
> conf/broker.conf" and "bin/gen-yml-from-env.py
> conf/functions_worker.yml" [1] to apply configurations passed in the
> environment to the configuration files in the container's root file 
system.
> This approach fails when the container's root file system is 
read-only due to

> strict security policies (`readOnlyRootFilesystem` in
> `securityContext`). This issue has been reported as #22088 [2].
>
> A temporary fix could involve using a temporary file to modify the
> configuration file when the filesystem is read-only. However, the Python
> script solution is not ideal, and we should consider eliminating it. In
> the long term, it would also be beneficial to remove the need for a
> shell script to start Pulsar, but that's a separate issue.
>
> For configuration handling, we need a solution that can apply overrides
> in memory, eliminating the need to modify on-disk files. Modern
> configuration frameworks can do this out-of-the-box. Currently, Pulsar
> uses a homegrown configuration framework. Instead of extending this
> framework, I propose we discuss replacing it with the Gestalt Config
> library [3]. This library, licensed under Apache-2.0, is a mature,
> well-established solution for configuration handling.
>
> Switching to Gestalt Config would allow us to move towards a more
> structured and modular configuration in Pulsar. Our current
> configuration is not modular, as it relies on a "god object" for
> configuration, which collects all possible configuration options.
> Gestalt Config offers modular usage patterns similar to those of
> Spring Boot's external configuration [4] and the MicroProfile Config [5]
> in Quarkus. However, Gestalt Config does not pull in other dependencies,
> giving it an advantage over Spring Boot and Quarkus configuration 
solutions.

> Other libraries in this category include the Typesafe config library [6]
> from Lightbend with HOCON [7], commonly used in Scala and Akka-based
> applications.
>
> Gestalt Config supports many configuration file formats, including flat
> properties files, yaml, json, toml, and even hocon. It also offers
> security features for reading secrets directly from Vault, AWS Secrets
> Manager, and GCP Secret Manager, without the need to use the file system
> or environment variables to inject secrets into the application
> configuration. This could significantly improve Pulsar's security
> posture.
>
> Pulsar's current "homegrown configuration framework" is quite simple,
> implemented in a few classes with the main logic in
> PulsarConfigurationLoader [8] and FieldParser [9] classes, called from
> the PulsarBrokerStarter class [10].
>
> The main question is: should we continue extending Pulsar's homegrown
> configuration framework, or should we consider adopting a library like
> Gestalt Config for future configuration use case improvements for
> modularity, structured configuration, and security?
>
> Best regards,
>
> Lari
>
> References:
> 1 - 
https://github.com/apache/pulsar-helm-chart/blob/29ea17b3fceef65160620b9018d0dd0449a168c5/charts/pulsar/templates/broker-statefulset.yaml#L210-L221

> 2 - https://github.com/apache/pulsar/issues/22088
> 3 - https://github.com/gestalt-config/gestalt
> 4 - 
https://docs.spring.io/spring-boot/docs/current/reference/html/features.html#features.external-config

> 5 - https://microprofile.io/specifications/microprofile-config/
> 6 - https://github.com/lightbend/config
> 7 - https://github.com/lightbend/config/blob/main/HOCON.md
> 8 - 
https://github.com/apache/pulsar/blob/master/pulsar-broker-common/src/main/java/org/apache/pulsar/common/configuration/PulsarConfigurationLoader.java
> 9 - 
https://github.com/apache/pulsar/blob/master/pulsar-common/src/main/java/org/apache/pulsar/common/util/FieldParser.java
> 10 - 
https://github.com/apache/pulsar/blob

Re: [DISCUSSION] Proposal to Replace Pulsar's Homegrown Configuration Framework with Gestalt Config

2024-04-10 Thread Dave Fisher
Moderators note. Colin is not subscribed to the mailing list. I’ve added them 
to the cc:

> On Apr 10, 2024, at 9:56 AM, Colin Redmond  wrote:
> 
> Hello,
> 
> I am the author of the Gestalt library and I am honored that you are 
> considering leveraging my library for your project.
> 
> There are a lot of really great configuration libraries out there to chose 
> from, and I see you have already done your to do your due diligence in 
> reviewing many (i would recommend against Lightbend, as it has been put into 
> maintenance mode and does not support newer features liked java records).
> 
> I think my zero dependency configuration library that is modular, extensible 
> and has a large number of features and functionality would be a great fit for 
> your project. It is great you already have a clear idea of how to leverage 
> the library to decouple your configs and increase your security posture.
> 
> I am usually quick to reply to issues or bugs and I am open to pull requests. 
> If i can help answer any questions feel free to reach out.
> 
> 
> Thanks,
> 
> Colin Redmond
> 
> 
> On 2024/02/21 09:26:19 Lari Hotari wrote:
> > Hello everyone,
> >
> > I would like to bring up an issue with Pulsar's containers, specifically
> > regarding the method of overriding configurations. For instance, the
> > Apache Pulsar Helm chart employs "bin/apply-config-from-env.py
> > conf/broker.conf" and "bin/gen-yml-from-env.py
> > conf/functions_worker.yml" [1] to apply configurations passed in the
> > environment to the configuration files in the container's root file system.
> > This approach fails when the container's root file system is read-only due 
> > to
> > strict security policies (`readOnlyRootFilesystem` in
> > `securityContext`). This issue has been reported as #22088 [2].
> >
> > A temporary fix could involve using a temporary file to modify the
> > configuration file when the filesystem is read-only. However, the Python
> > script solution is not ideal, and we should consider eliminating it. In
> > the long term, it would also be beneficial to remove the need for a
> > shell script to start Pulsar, but that's a separate issue.
> >
> > For configuration handling, we need a solution that can apply overrides
> > in memory, eliminating the need to modify on-disk files. Modern
> > configuration frameworks can do this out-of-the-box. Currently, Pulsar
> > uses a homegrown configuration framework. Instead of extending this
> > framework, I propose we discuss replacing it with the Gestalt Config
> > library [3]. This library, licensed under Apache-2.0, is a mature,
> > well-established solution for configuration handling.
> >
> > Switching to Gestalt Config would allow us to move towards a more
> > structured and modular configuration in Pulsar. Our current
> > configuration is not modular, as it relies on a "god object" for
> > configuration, which collects all possible configuration options.
> > Gestalt Config offers modular usage patterns similar to those of
> > Spring Boot's external configuration [4] and the MicroProfile Config [5]
> > in Quarkus. However, Gestalt Config does not pull in other dependencies,
> > giving it an advantage over Spring Boot and Quarkus configuration solutions.
> > Other libraries in this category include the Typesafe config library [6]
> > from Lightbend with HOCON [7], commonly used in Scala and Akka-based
> > applications.
> >
> > Gestalt Config supports many configuration file formats, including flat
> > properties files, yaml, json, toml, and even hocon. It also offers
> > security features for reading secrets directly from Vault, AWS Secrets
> > Manager, and GCP Secret Manager, without the need to use the file system
> > or environment variables to inject secrets into the application
> > configuration. This could significantly improve Pulsar's security
> > posture.
> >
> > Pulsar's current "homegrown configuration framework" is quite simple,
> > implemented in a few classes with the main logic in
> > PulsarConfigurationLoader [8] and FieldParser [9] classes, called from
> > the PulsarBrokerStarter class [10].
> >
> > The main question is: should we continue extending Pulsar's homegrown
> > configuration framework, or should we consider adopting a library like
> > Gestalt Config for future configuration use case improvements for
> > modularity, structured configuration, and security?
> >
> > Best regards,
> >
> > Lari
> >
> > References:
> > 1 - 
> > https://github.com/apache/pulsar-helm-chart/blob/29ea17b3fceef65160620b9018d0dd0449a168c5/charts/pulsar/templates/broker-statefulset.yaml#L210-L221
> > 2 - https://github.com/apache/pulsar/issues/22088
> > 3 - https://github.com/gestalt-config/gestalt
> > 4 - 
> > https://docs.spring.io/spring-boot/docs/current/reference/html/features.html#features.external-config
> > 5 - https://microprofile.io/specifications/microprofile-config/
> > 6 - https://github.com/lightbend/config
> > 7 - https://github.com/lightbend/config/blob/main/HOCON.

[VOTE] Reactive Java client for Apache Pulsar 0.5.4 Candidate 1

2024-04-10 Thread Chris Bono
Following PIP-205: Reactive Java client for Apache Pulsar (
https://github.com/apache/pulsar/issues/17335), this is release
candidate 1 for the Reactive Java client for Apache Pulsar, version 0.5.4.

*** Please download, test and vote on this release. This vote will stay open
for at least 72 hours ***

Note that we are voting upon the source (tag). Binaries in the Maven repository
are provided for convenience.

Source package:
https://dist.apache.org/repos/dist/dev/pulsar/pulsar-client-reactive-0.5.4-candidate-1/

SHA-512 checksums:
f1640fa49571df0bf449f05cf106f417e1f4609020d6c11a54a9e5d601d62b4db7aaa8a59a1ce9e5026a410b2a7bb351b8d0415e1c1e7a3b2487c167a4c8dd5c
 pulsar-client-reactive-0.5.4-src.tar.gz

Maven staging repo:
https://repository.apache.org/content/repositories/orgapachepulsar-1287/

The tag to be voted upon:
v0.5.4-candidate-1 (1139dd84138be1758f078b160b986307af714b10)
https://github.com/apache/pulsar-client-reactive/releases/tag/v0.5.4-candidate-1

Please download the source package, and follow detailed instructions
for pulsar-client-reactive release validation at
https://github.com/apache/pulsar-client-reactive/wiki/Release-process#release-validation
.

Best regards

Chris Bono


Re: [VOTE] Reactive Java client for Apache Pulsar 0.5.4 Candidate 1

2024-04-10 Thread Lari Hotari
+1 (binding)

Validated source artifacts
- checked sha512 checksum
- checked signature

Validated binaries
- ran sample app

-Lari

On 2024/04/10 18:32:06 Chris Bono wrote:
> Following PIP-205: Reactive Java client for Apache Pulsar (
> https://github.com/apache/pulsar/issues/17335), this is release
> candidate 1 for the Reactive Java client for Apache Pulsar, version 0.5.4.
> 
> *** Please download, test and vote on this release. This vote will stay open
> for at least 72 hours ***
> 
> Note that we are voting upon the source (tag). Binaries in the Maven 
> repository
> are provided for convenience.
> 
> Source package:
> https://dist.apache.org/repos/dist/dev/pulsar/pulsar-client-reactive-0.5.4-candidate-1/
> 
> SHA-512 checksums:
> f1640fa49571df0bf449f05cf106f417e1f4609020d6c11a54a9e5d601d62b4db7aaa8a59a1ce9e5026a410b2a7bb351b8d0415e1c1e7a3b2487c167a4c8dd5c
>  pulsar-client-reactive-0.5.4-src.tar.gz
> 
> Maven staging repo:
> https://repository.apache.org/content/repositories/orgapachepulsar-1287/
> 
> The tag to be voted upon:
> v0.5.4-candidate-1 (1139dd84138be1758f078b160b986307af714b10)
> https://github.com/apache/pulsar-client-reactive/releases/tag/v0.5.4-candidate-1
> 
> Please download the source package, and follow detailed instructions
> for pulsar-client-reactive release validation at
> https://github.com/apache/pulsar-client-reactive/wiki/Release-process#release-validation
> .
> 
> Best regards
> 
> Chris Bono
>