On Wed, 26 Feb 2025 at 09:13, Yunze Xu <x...@apache.org> wrote:
>
> I opened a PR to clean up some recent config changes:
> https://github.com/apache/pulsar/pull/24027

I don't think that removing the configuration values would be helpful,
though I appreciate that your PR was an effective way to create
urgency around solving this problem. I agree with you that the current
situation is a mess. I value your efforts to drive improvements in
this area, so please don't interpret my comments as resistance to
change. The problem extends beyond what we've discussed in this thread
so far.

In an earlier email to the list [1], I brought up some other issues
which are related to how configuration is applied in Docker
containers.
There's a very dirty solution to modify the config files using Python
scripts in Pulsar docker containers, such as
"bin/apply-config-from-env.py conf/broker.conf" and
"bin/gen-yml-from-env.py conf/functions_worker.yml". The earlier email
was focusing on solving that problem by selecting a new configuration
library that supports multiple config sources and could directly use
environment variables or other sources.

Another reason why the Pulsar configuration is a mess is that it's all
consolidated in a single location, ServiceConfiguration. From an
object-oriented design perspective, ServiceConfiguration has become a
"God Object" [2] in the codebase, which creates its own set of
problems. In more modern configuration approaches, configuration for
modules is kept within each module and is designed to be hierarchical
and modular. For example, Spring Boot's configuration solution (when
used correctly) supports this approach.

In our APIs, we have a classification of audience: Public,
LimitedPrivate, and Private with the
org.apache.pulsar.common.classification.InterfaceAudience annotation
and stability classification of Stable, Evolving, and Unstable.
Perhaps configuration options could also be categorized as "Private"
when they're settings that shouldn't be primarily configured by end
users and are tunables for implementation level details.

It seems that "PIP-346: Add a simplified configuration file for
Pulsar" [3] intends to solve categorization with
"broker-defaults.conf". I initially resisted that approach so that we
would address the broader problem and not just add a patch to the
current solution. Patching could be a way forward if we aren't able to
conclude on long-term improvements and cleaning up the mess. My
concern is that a point solution could just add more complexity
without resolving the problems around configuration modularity and
hierarchy.

> Pulsar's configuration is like a trashbox, to which everyone throws
> trash without any consideration. Some configs are hard to understand
> without looking into the implementation details. The PIP requirement
> should prevent such things but it seems that very few contributors
> follow it.

This is the reason why separating implementation-level detail configs
("Private" audience) and user-level configs ("Public" audience) could
be helpful. I guess that a solution in the direction of PIP-346 could
help with this categorization in addition to using annotations in the
ServiceConfiguration file. I wonder if instead of "broker.conf" and
"broker-defaults.conf", there could be two separate files:
"broker.conf" for public settings and "broker-internal.conf" for the
internal implementation-level detail settings. However, that could
introduce new problems such as issues with the
"apply-config-from-env.py" script and Pulsar docker container
configuration.

-Lari

1 - https://lists.apache.org/thread/8splwyrn25pt5rb6ph0yz0tfgknn3hqh
2 - https://en.wikipedia.org/wiki/God_object
3 - https://github.com/apache/pulsar/pull/22274/files


On Wed, 26 Feb 2025 at 09:13, Yunze Xu <x...@apache.org> wrote:
>
> I opened a PR to clean up some recent config changes:
> https://github.com/apache/pulsar/pull/24027
>
> Pulsar's configuration is like a trashbox, to which everyone throws
> trash without any consideration. Some configs are hard to understand
> without looking into the implementation details. The PIP requirement
> should prevent such things but it seems that very few contributors
> follow it.
>
> I raised my concern two years ago [1] but things are getting worse.
> IMHO, most configs do not have any value to exist. For these configs,
> no one knows when it should be changed and if the customized value
> could be better. AUthors tend to have no responsibility for the config
> added by them. They just think, "you should thank me for providing a
> config to control the behavior". They never stand from a perspective
> of users.
>
> [1] https://lists.apache.org/thread/j23ny19opmp8jww57gwk7g27b5dvl0ot
>
> Thanks,
> Yunze
>
> On Wed, Feb 26, 2025 at 2:33 PM WenZhi Feng <thetumb...@apache.org> wrote:
> >
> > Same confusion about the criterion of PIP.
> >
> > Wenzhi Feng.
> >
> > On 2025/02/26 04:42:27 Yunze Xu wrote:
> > > Hi all,
> > >
> > > I noticed two PRs were merged recently and cherry-picked into
> > > branch-3.0 and branch-4.0, where 3.0 and 4.0 are LTS releases.
> > >
> > > https://github.com/apache/pulsar/pull/24012 fixes a bug for the
> > > implementation of PIP-322. However, it adds a new configuration
> > > without a new PIP or change on the existing PIP. It definitely
> > > violates "When is a PIP required" [1]. (Any change to the
> > > configuration)
> > >
> > > https://github.com/apache/pulsar/pull/23697 fixes a message loss issue
> > > caused by geo-replication by introducing a new feature flag
> > > `supports_repl_dedup_by_lid_and_eid`. It also violates [1] (Any new
> > > feature for Pulsar brokers or client)
> > >
> > > I also noticed https://github.com/apache/pulsar/pull/24025, which adds
> > > a new config `managedLedgerOffloadReadThreads` as well. Though it's
> > > only cherry-picked into branch-3.3 but not for LTS releases (3.0 and
> > > 4.0)
> > >
> > > I found the definition of "bug-fixes" ambiguous. It seems like a
> > > silver bullet that if we want a PR to be included in the LTS release,
> > > we only need to say, "it's a bug fix", no matter if it changes
> > > anything that requires a PIP.
> > >
> > > Based on the fact that the PIP rule is not widely followed, should we
> > > update and loose the requirement?
> > >
> > > [1] 
> > > https://github.com/apache/pulsar/tree/master/pip#when-is-a-pip-required
> > >
> > > Thanks,
> > > Yunze
> > >

Reply via email to