Re: Config for new clients (and server)

Jun Rao Tue, 11 Feb 2014 21:31:43 -0800

I actually think this is useful for non-LinkedIn users as well. The
following is the tradeoff that I see.


Most users probably won't care about seeing an extra 10-20 lines of INFO
level logging when starting up a client. However, it's very easy for users
to (1) mis-spell a config name (there was an issue in the mailing list just
a few days ago when a user mistyped "advertised.host" to "advertise.host")
or (2) inadvertently override a config value through the config systems. In
both cases, the INFO level logging will make it much easier for the user to
realize the human mistake. So, I think this is a case where the benefit
outweighs the disadvantage.

Thanks,

Jun


On Mon, Feb 10, 2014 at 4:13 PM, Jay Kreps <jay.kr...@gmail.com> wrote:

> Yeah I am aware of how zookeeper behaves, I think it is kind of gross.
>
> I think logging it at DEBUG gets you what you want--by default we don't
> pollute logs, but anyone who wants to log this can enable DEBUG logging on
> org.apache.kafka.clients.producer.ProducerConfig.
>
> If we want this on by default at LinkedIn we can just set this logger to
> debug in our wrapper, we don't need to inflict this on everyone.
>
> The point is that spewing out each config IS a debug according to our
> definition:
>   http://kafka.apache.org/coding-guide.html
>
> -Jay
>
>
> On Mon, Feb 10, 2014 at 2:01 PM, Jun Rao <jun...@gmail.com> wrote:
>
> > I actually prefer to see those at INFO level. The reason is that the
> config
> > system in an application can be complex. Some configs can be overridden
> in
> > different layers and it may not be easy to determine what the final
> binding
> > value is. The logging in Kafka will serve as the source of truth.
> >
> > For reference, ZK client logs all overridden values during
> initialization.
> > It's a one time thing during starting up, so shouldn't add much noise.
> It's
> > very useful for debugging subtle config issues.
> >
> > Exposing final configs programmatically is potentially useful. If we
> don't
> > want to log overridden values out of box, an app can achieve the same
> thing
> > using the programming api. The only missing thing is that we won't know
> > those unused property keys, which is probably less important than seeing
> > the overridden values.
> >
> > Thanks,
> >
> > Jun
> >
> >
> > On Mon, Feb 10, 2014 at 10:15 AM, Jay Kreps <jay.kr...@gmail.com> wrote:
> >
> > > Hey Jun,
> > >
> > > I think that is reasonable but would object to having it be debug
> > logging?
> > > I think logging out a bunch of noise during normal operation in a
> client
> > > library is pretty ugly. Also, is there value in exposing the final
> > configs
> > > programmatically?
> > >
> > > -Jay
> > >
> > >
> > >
> > > On Sun, Feb 9, 2014 at 9:23 PM, Jun Rao <jun...@gmail.com> wrote:
> > >
> > > > +1 on the new config. Just one comment. Currently, when initiating a
> > > config
> > > > (e.g. ProducerConfig), we log those overridden property values and
> > unused
> > > > property keys (likely due to mis-spelling). This has been very useful
> > for
> > > > config verification. It would be good to add similar support in the
> new
> > > > config.
> > > >
> > > > Thanks,
> > > >
> > > > Jun
> > > >
> > > >
> > > > On Tue, Feb 4, 2014 at 9:34 AM, Jay Kreps <jay.kr...@gmail.com>
> wrote:
> > > >
> > > > > We touched on this a bit in previous discussions, but I wanted to
> > draw
> > > > out
> > > > > the approach to config specifically as an item of discussion.
> > > > >
> > > > > The new producer and consumer use a similar key-value config
> approach
> > > as
> > > > > the existing scala clients but have different implementation code
> to
> > > help
> > > > > define these configs. The plan is to use the same approach on the
> > > server,
> > > > > once the new clients are complete; so if we agree on this approach
> it
> > > > will
> > > > > be the new default across the board.
> > > > >
> > > > > Let me split this into two parts. First I will try to motivate the
> > use
> > > of
> > > > > key-value pairs as a configuration api. Then let me discuss the
> > > mechanics
> > > > > of specifying and parsing these. If we agree on the public api then
> > the
> > > > > public api then the implementation details are interesting as this
> > will
> > > > be
> > > > > shared across producer, consumer, and broker and potentially some
> > > tools;
> > > > > but if we disagree about the api then there is no point in
> discussing
> > > the
> > > > > implementation.
> > > > >
> > > > > Let me explain the rationale for this. In a sense a key-value map
> of
> > > > > configs is the worst possible API to the programmer using the
> > clients.
> > > > Let
> > > > > me contrast the pros and cons versus a POJO and motivate why I
> think
> > it
> > > > is
> > > > > still superior overall.
> > > > >
> > > > > Pro: An application can externalize the configuration of its kafka
> > > > clients
> > > > > into its own configuration. Whatever config management system the
> > > client
> > > > > application is using will likely support key-value pairs, so the
> > client
> > > > > should be able to directly pull whatever configurations are present
> > and
> > > > use
> > > > > them in its client. This means that any configuration the client
> > > supports
> > > > > can be added to any application at runtime. With the pojo approach
> > the
> > > > > client application has to expose each pojo getter as some config
> > > > parameter.
> > > > > The result of many applications doing this is that the config is
> > > > different
> > > > > for each and it is very hard to have a standard client config
> shared
> > > > > across. Moving config into config files allows the usual tooling
> > > (version
> > > > > control, review, audit, config deployments separate from code
> pushes,
> > > > > etc.).
> > > > >
> > > > > Pro: Backwards and forwards compatibility. Provided we stick to our
> > > java
> > > > > api many internals can evolve and expose new configs. The
> application
> > > can
> > > > > support both the new and old client by just specifying a config
> that
> > > will
> > > > > be unused in the older version (and of course the reverse--we can
> > > remove
> > > > > obsolete configs).
> > > > >
> > > > > Pro: We can use a similar mechanism for both the client and the
> > server.
> > > > > Since most people run the server as a stand-alone process it needs
> a
> > > > config
> > > > > file.
> > > > >
> > > > > Pro: Systems like Samza that need to ship configs across the
> network
> > > can
> > > > > easily do so as configs have a natural serialized form. This can be
> > > done
> > > > > with pojos using java serialization but it is ugly and has bizare
> > > failure
> > > > > cases.
> > > > >
> > > > > Con: The IDE gives nice auto-completion for pojos.
> > > > >
> > > > > Con: There are some advantages to javadoc as a documentation
> > mechanism
> > > > for
> > > > > java people.
> > > > >
> > > > > Basically to me this is about operability versus niceness of api
> and
> > I
> > > > > think operability is more important.
> > > > >
> > > > > Let me now give some details of the config support classes in
> > > > > kafka.common.config and how they are intended to be used.
> > > > >
> > > > > The goal of this code is the following:
> > > > > 1. Make specifying configs, their expected type (string, numbers,
> > > lists,
> > > > > etc) simple and declarative
> > > > > 2. Allow for validating simple checks (numeric range checks, etc)
> > > > > 3. Make the config "self-documenting". I.e. we should be able to
> > write
> > > > code
> > > > > that generates the configuration documentation off the config def.
> > > > > 4. Specify default values.
> > > > > 5. Track which configs actually get used.
> > > > > 6. Make it easy to get config values.
> > > > >
> > > > > There are two classes there: ConfigDef and AbstractConfig.
> ConfigDef
> > > > > defines the specification of the accepted configurations and
> > > > AbstractConfig
> > > > > is a helper class for implementing the configuration class. The
> > > > difference
> > > > > is kind of like the difference between a "class" and an "object":
> > > > ConfigDef
> > > > > is for specifying the configurations that are accepted,
> > AbstractConfig
> > > is
> > > > > the base class for an instance of these configs.
> > > > >
> > > > > You can see this in action here:
> > > > >
> > > > >
> > > >
> > >
> >
> https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=blob_plain;f=clients/src/main/java/kafka/clients/producer/ProducerConfig.java;hb=HEAD
> > > > >
> > > > > (Ignore the static config names in there for now...I'm not actually
> > > sure
> > > > > that is the best approach).
> > > > >
> > > > > So the way this works is that the config specification is defined
> as:
> > > > >
> > > > >         config = new ConfigDef().define("bootstrap.brokers",
> > Type.LIST,
> > > > > "documentation")
> > > > >
> > > > >                                 .define("metadata.timeout.ms",
> > > > Type.LONG,
> > > > > 60 * 1000, atLeast(0), "documentation")
> > > > >                                 .define("max.partition.size",
> > Type.INT,
> > > > > 16384, atLeast(0), "documentation")
> > > > >
> > > > >
> > > > > This is used in a ProducerConfig class which extends AbstractConfig
> > to
> > > > get
> > > > > access to some helper methods as well as the logic for tracking
> which
> > > > > configs get accessed.
> > > > >
> > > > > Currently I have included static String variables for each of the
> > > config
> > > > > names in that class. However I actually think that is not very
> > helpful
> > > as
> > > > > the javadoc for them doesn't give the constant value and requires
> > > > > duplicating the documentation. To understand this point look at the
> > > > javadoc
> > > > > and note that the doc on the string is not the same as what we
> define
> > > in
> > > > > the ConfigDef. We could just have the javadoc for the config string
> > be
> > > > the
> > > > > source of truth but it is actually pretty inconvient for that as it
> > > > doesn't
> > > > > show you the value of the constant, just the variable name (unless
> > you
> > > > > discover how to unhide it). That is fine for the clients, but for
> the
> > > > > server would be very weird especially for non-java people. We could
> > > > attempt
> > > > > to duplicate documentation between the javadoc and the ConfigDef
> but
> > > > given
> > > > > our struggle to get well-documented config in a single place this
> > seems
> > > > > unwise.
> > > > >
> > > > > So I recommend we have a single source for documentation of these
> and
> > > > that
> > > > > that source be the website documentation on configuration that
> covers
> > > > > clients and server and that that be generated off the config defs.
> > The
> > > > > javadoc on KafkaProducer will link to this table so it should be
> > quite
> > > > > convenient to discover. This makes things a little more typo prone,
> > but
> > > > > that should be easily caught by the key detection. This will also
> > make
> > > it
> > > > > possible for us to retire configs in the future without causing
> > compile
> > > > > failures and add configs without having use of them break backwards
> > > > > compatibility. This is useful during upgrades where you want to be
> > > > > compatible with the old and new version so you can roll forwards
> and
> > > > > backwards.
> > > > >
> > > > > -Jay
> > > > >
> > > >
> > >
> >
>

Re: Config for new clients (and server)

Reply via email to