On Wed, 26 Feb 2025 at 09:13, Yunze Xu <x...@apache.org> wrote: > > I opened a PR to clean up some recent config changes: > https://github.com/apache/pulsar/pull/24027
I don't think that removing the configuration values would be helpful, though I appreciate that your PR was an effective way to create urgency around solving this problem. I agree with you that the current situation is a mess. I value your efforts to drive improvements in this area, so please don't interpret my comments as resistance to change. The problem extends beyond what we've discussed in this thread so far. In an earlier email to the list [1], I brought up some other issues which are related to how configuration is applied in Docker containers. There's a very dirty solution to modify the config files using Python scripts in Pulsar docker containers, such as "bin/apply-config-from-env.py conf/broker.conf" and "bin/gen-yml-from-env.py conf/functions_worker.yml". The earlier email was focusing on solving that problem by selecting a new configuration library that supports multiple config sources and could directly use environment variables or other sources. Another reason why the Pulsar configuration is a mess is that it's all consolidated in a single location, ServiceConfiguration. From an object-oriented design perspective, ServiceConfiguration has become a "God Object" [2] in the codebase, which creates its own set of problems. In more modern configuration approaches, configuration for modules is kept within each module and is designed to be hierarchical and modular. For example, Spring Boot's configuration solution (when used correctly) supports this approach. In our APIs, we have a classification of audience: Public, LimitedPrivate, and Private with the org.apache.pulsar.common.classification.InterfaceAudience annotation and stability classification of Stable, Evolving, and Unstable. Perhaps configuration options could also be categorized as "Private" when they're settings that shouldn't be primarily configured by end users and are tunables for implementation level details. It seems that "PIP-346: Add a simplified configuration file for Pulsar" [3] intends to solve categorization with "broker-defaults.conf". I initially resisted that approach so that we would address the broader problem and not just add a patch to the current solution. Patching could be a way forward if we aren't able to conclude on long-term improvements and cleaning up the mess. My concern is that a point solution could just add more complexity without resolving the problems around configuration modularity and hierarchy. > Pulsar's configuration is like a trashbox, to which everyone throws > trash without any consideration. Some configs are hard to understand > without looking into the implementation details. The PIP requirement > should prevent such things but it seems that very few contributors > follow it. This is the reason why separating implementation-level detail configs ("Private" audience) and user-level configs ("Public" audience) could be helpful. I guess that a solution in the direction of PIP-346 could help with this categorization in addition to using annotations in the ServiceConfiguration file. I wonder if instead of "broker.conf" and "broker-defaults.conf", there could be two separate files: "broker.conf" for public settings and "broker-internal.conf" for the internal implementation-level detail settings. However, that could introduce new problems such as issues with the "apply-config-from-env.py" script and Pulsar docker container configuration. -Lari 1 - https://lists.apache.org/thread/8splwyrn25pt5rb6ph0yz0tfgknn3hqh 2 - https://en.wikipedia.org/wiki/God_object 3 - https://github.com/apache/pulsar/pull/22274/files On Wed, 26 Feb 2025 at 09:13, Yunze Xu <x...@apache.org> wrote: > > I opened a PR to clean up some recent config changes: > https://github.com/apache/pulsar/pull/24027 > > Pulsar's configuration is like a trashbox, to which everyone throws > trash without any consideration. Some configs are hard to understand > without looking into the implementation details. The PIP requirement > should prevent such things but it seems that very few contributors > follow it. > > I raised my concern two years ago [1] but things are getting worse. > IMHO, most configs do not have any value to exist. For these configs, > no one knows when it should be changed and if the customized value > could be better. AUthors tend to have no responsibility for the config > added by them. They just think, "you should thank me for providing a > config to control the behavior". They never stand from a perspective > of users. > > [1] https://lists.apache.org/thread/j23ny19opmp8jww57gwk7g27b5dvl0ot > > Thanks, > Yunze > > On Wed, Feb 26, 2025 at 2:33 PM WenZhi Feng <thetumb...@apache.org> wrote: > > > > Same confusion about the criterion of PIP. > > > > Wenzhi Feng. > > > > On 2025/02/26 04:42:27 Yunze Xu wrote: > > > Hi all, > > > > > > I noticed two PRs were merged recently and cherry-picked into > > > branch-3.0 and branch-4.0, where 3.0 and 4.0 are LTS releases. > > > > > > https://github.com/apache/pulsar/pull/24012 fixes a bug for the > > > implementation of PIP-322. However, it adds a new configuration > > > without a new PIP or change on the existing PIP. It definitely > > > violates "When is a PIP required" [1]. (Any change to the > > > configuration) > > > > > > https://github.com/apache/pulsar/pull/23697 fixes a message loss issue > > > caused by geo-replication by introducing a new feature flag > > > `supports_repl_dedup_by_lid_and_eid`. It also violates [1] (Any new > > > feature for Pulsar brokers or client) > > > > > > I also noticed https://github.com/apache/pulsar/pull/24025, which adds > > > a new config `managedLedgerOffloadReadThreads` as well. Though it's > > > only cherry-picked into branch-3.3 but not for LTS releases (3.0 and > > > 4.0) > > > > > > I found the definition of "bug-fixes" ambiguous. It seems like a > > > silver bullet that if we want a PR to be included in the LTS release, > > > we only need to say, "it's a bug fix", no matter if it changes > > > anything that requires a PIP. > > > > > > Based on the fact that the PIP rule is not widely followed, should we > > > update and loose the requirement? > > > > > > [1] > > > https://github.com/apache/pulsar/tree/master/pip#when-is-a-pip-required > > > > > > Thanks, > > > Yunze > > >