Removing these configurations is helpful indeed. Once it's included in a release, removing them could be much harder. This PR also reminds the authors: are these configurations really necessary? If so, just use a separate PR to explain your reason and add these configs back again. PIP is suitable for that. If the process of PIP is a formalism that takes much time, I agree with just using a separate simple PR for the purpose, as well as updating the PIP process.
Providing some configs for users to tune is unrealistic. It's exactly equivalent to provide an unstable feature and tell users it could be tuned to a good situation, with no actual testing. For example, a delayed task has a configurable timeout (assuming it's 60 seconds). >From the perspective of a Pulsar vendor, when the customer reported an issue, there is no chance for the vendor to tune configs, which means the whole cluster needs to be restarted again and again until a good situation is reached. The original motivation of this mail list is not related to how to improve the config. It's more like a discussion if we insist on adding configurations endlessly without a formal process. We prevented changes from many contributors and told them we don't want to make configurations more complicated. If it's not the truth, don't pretend to declare something like LTS. Just document it clearly: welcome to contribute configurations for users to tune and we'll resolve the issue in (an uncertain) future. [1] https://github.com/apache/pulsar/pull/24027#pullrequestreview-2643354724 Thanks, Yunze On Wed, Feb 26, 2025 at 4:08 PM Lari Hotari <lhot...@apache.org> wrote: > > On Wed, 26 Feb 2025 at 09:13, Yunze Xu <x...@apache.org> wrote: > > > > I opened a PR to clean up some recent config changes: > > https://github.com/apache/pulsar/pull/24027 > > I don't think that removing the configuration values would be helpful, > though I appreciate that your PR was an effective way to create > urgency around solving this problem. I agree with you that the current > situation is a mess. I value your efforts to drive improvements in > this area, so please don't interpret my comments as resistance to > change. The problem extends beyond what we've discussed in this thread > so far. > > In an earlier email to the list [1], I brought up some other issues > which are related to how configuration is applied in Docker > containers. > There's a very dirty solution to modify the config files using Python > scripts in Pulsar docker containers, such as > "bin/apply-config-from-env.py conf/broker.conf" and > "bin/gen-yml-from-env.py conf/functions_worker.yml". The earlier email > was focusing on solving that problem by selecting a new configuration > library that supports multiple config sources and could directly use > environment variables or other sources. > > Another reason why the Pulsar configuration is a mess is that it's all > consolidated in a single location, ServiceConfiguration. From an > object-oriented design perspective, ServiceConfiguration has become a > "God Object" [2] in the codebase, which creates its own set of > problems. In more modern configuration approaches, configuration for > modules is kept within each module and is designed to be hierarchical > and modular. For example, Spring Boot's configuration solution (when > used correctly) supports this approach. > > In our APIs, we have a classification of audience: Public, > LimitedPrivate, and Private with the > org.apache.pulsar.common.classification.InterfaceAudience annotation > and stability classification of Stable, Evolving, and Unstable. > Perhaps configuration options could also be categorized as "Private" > when they're settings that shouldn't be primarily configured by end > users and are tunables for implementation level details. > > It seems that "PIP-346: Add a simplified configuration file for > Pulsar" [3] intends to solve categorization with > "broker-defaults.conf". I initially resisted that approach so that we > would address the broader problem and not just add a patch to the > current solution. Patching could be a way forward if we aren't able to > conclude on long-term improvements and cleaning up the mess. My > concern is that a point solution could just add more complexity > without resolving the problems around configuration modularity and > hierarchy. > > > Pulsar's configuration is like a trashbox, to which everyone throws > > trash without any consideration. Some configs are hard to understand > > without looking into the implementation details. The PIP requirement > > should prevent such things but it seems that very few contributors > > follow it. > > This is the reason why separating implementation-level detail configs > ("Private" audience) and user-level configs ("Public" audience) could > be helpful. I guess that a solution in the direction of PIP-346 could > help with this categorization in addition to using annotations in the > ServiceConfiguration file. I wonder if instead of "broker.conf" and > "broker-defaults.conf", there could be two separate files: > "broker.conf" for public settings and "broker-internal.conf" for the > internal implementation-level detail settings. However, that could > introduce new problems such as issues with the > "apply-config-from-env.py" script and Pulsar docker container > configuration. > > -Lari > > 1 - https://lists.apache.org/thread/8splwyrn25pt5rb6ph0yz0tfgknn3hqh > 2 - https://en.wikipedia.org/wiki/God_object > 3 - https://github.com/apache/pulsar/pull/22274/files > > > On Wed, 26 Feb 2025 at 09:13, Yunze Xu <x...@apache.org> wrote: > > > > I opened a PR to clean up some recent config changes: > > https://github.com/apache/pulsar/pull/24027 > > > > Pulsar's configuration is like a trashbox, to which everyone throws > > trash without any consideration. Some configs are hard to understand > > without looking into the implementation details. The PIP requirement > > should prevent such things but it seems that very few contributors > > follow it. > > > > I raised my concern two years ago [1] but things are getting worse. > > IMHO, most configs do not have any value to exist. For these configs, > > no one knows when it should be changed and if the customized value > > could be better. AUthors tend to have no responsibility for the config > > added by them. They just think, "you should thank me for providing a > > config to control the behavior". They never stand from a perspective > > of users. > > > > [1] https://lists.apache.org/thread/j23ny19opmp8jww57gwk7g27b5dvl0ot > > > > Thanks, > > Yunze > > > > On Wed, Feb 26, 2025 at 2:33 PM WenZhi Feng <thetumb...@apache.org> wrote: > > > > > > Same confusion about the criterion of PIP. > > > > > > Wenzhi Feng. > > > > > > On 2025/02/26 04:42:27 Yunze Xu wrote: > > > > Hi all, > > > > > > > > I noticed two PRs were merged recently and cherry-picked into > > > > branch-3.0 and branch-4.0, where 3.0 and 4.0 are LTS releases. > > > > > > > > https://github.com/apache/pulsar/pull/24012 fixes a bug for the > > > > implementation of PIP-322. However, it adds a new configuration > > > > without a new PIP or change on the existing PIP. It definitely > > > > violates "When is a PIP required" [1]. (Any change to the > > > > configuration) > > > > > > > > https://github.com/apache/pulsar/pull/23697 fixes a message loss issue > > > > caused by geo-replication by introducing a new feature flag > > > > `supports_repl_dedup_by_lid_and_eid`. It also violates [1] (Any new > > > > feature for Pulsar brokers or client) > > > > > > > > I also noticed https://github.com/apache/pulsar/pull/24025, which adds > > > > a new config `managedLedgerOffloadReadThreads` as well. Though it's > > > > only cherry-picked into branch-3.3 but not for LTS releases (3.0 and > > > > 4.0) > > > > > > > > I found the definition of "bug-fixes" ambiguous. It seems like a > > > > silver bullet that if we want a PR to be included in the LTS release, > > > > we only need to say, "it's a bug fix", no matter if it changes > > > > anything that requires a PIP. > > > > > > > > Based on the fact that the PIP rule is not widely followed, should we > > > > update and loose the requirement? > > > > > > > > [1] > > > > https://github.com/apache/pulsar/tree/master/pip#when-is-a-pip-required > > > > > > > > Thanks, > > > > Yunze > > > >