Re: Config for new clients (and server)

Jun Rao Sun, 09 Feb 2014 21:24:42 -0800

+1 on the new config. Just one comment. Currently, when initiating a config
(e.g. ProducerConfig), we log those overridden property values and unused
property keys (likely due to mis-spelling). This has been very useful for
config verification. It would be good to add similar support in the new
config.


Thanks,

Jun


On Tue, Feb 4, 2014 at 9:34 AM, Jay Kreps <jay.kr...@gmail.com> wrote:

> We touched on this a bit in previous discussions, but I wanted to draw out
> the approach to config specifically as an item of discussion.
>
> The new producer and consumer use a similar key-value config approach as
> the existing scala clients but have different implementation code to help
> define these configs. The plan is to use the same approach on the server,
> once the new clients are complete; so if we agree on this approach it will
> be the new default across the board.
>
> Let me split this into two parts. First I will try to motivate the use of
> key-value pairs as a configuration api. Then let me discuss the mechanics
> of specifying and parsing these. If we agree on the public api then the
> public api then the implementation details are interesting as this will be
> shared across producer, consumer, and broker and potentially some tools;
> but if we disagree about the api then there is no point in discussing the
> implementation.
>
> Let me explain the rationale for this. In a sense a key-value map of
> configs is the worst possible API to the programmer using the clients. Let
> me contrast the pros and cons versus a POJO and motivate why I think it is
> still superior overall.
>
> Pro: An application can externalize the configuration of its kafka clients
> into its own configuration. Whatever config management system the client
> application is using will likely support key-value pairs, so the client
> should be able to directly pull whatever configurations are present and use
> them in its client. This means that any configuration the client supports
> can be added to any application at runtime. With the pojo approach the
> client application has to expose each pojo getter as some config parameter.
> The result of many applications doing this is that the config is different
> for each and it is very hard to have a standard client config shared
> across. Moving config into config files allows the usual tooling (version
> control, review, audit, config deployments separate from code pushes,
> etc.).
>
> Pro: Backwards and forwards compatibility. Provided we stick to our java
> api many internals can evolve and expose new configs. The application can
> support both the new and old client by just specifying a config that will
> be unused in the older version (and of course the reverse--we can remove
> obsolete configs).
>
> Pro: We can use a similar mechanism for both the client and the server.
> Since most people run the server as a stand-alone process it needs a config
> file.
>
> Pro: Systems like Samza that need to ship configs across the network can
> easily do so as configs have a natural serialized form. This can be done
> with pojos using java serialization but it is ugly and has bizare failure
> cases.
>
> Con: The IDE gives nice auto-completion for pojos.
>
> Con: There are some advantages to javadoc as a documentation mechanism for
> java people.
>
> Basically to me this is about operability versus niceness of api and I
> think operability is more important.
>
> Let me now give some details of the config support classes in
> kafka.common.config and how they are intended to be used.
>
> The goal of this code is the following:
> 1. Make specifying configs, their expected type (string, numbers, lists,
> etc) simple and declarative
> 2. Allow for validating simple checks (numeric range checks, etc)
> 3. Make the config "self-documenting". I.e. we should be able to write code
> that generates the configuration documentation off the config def.
> 4. Specify default values.
> 5. Track which configs actually get used.
> 6. Make it easy to get config values.
>
> There are two classes there: ConfigDef and AbstractConfig. ConfigDef
> defines the specification of the accepted configurations and AbstractConfig
> is a helper class for implementing the configuration class. The difference
> is kind of like the difference between a "class" and an "object": ConfigDef
> is for specifying the configurations that are accepted, AbstractConfig is
> the base class for an instance of these configs.
>
> You can see this in action here:
>
> https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=blob_plain;f=clients/src/main/java/kafka/clients/producer/ProducerConfig.java;hb=HEAD
>
> (Ignore the static config names in there for now...I'm not actually sure
> that is the best approach).
>
> So the way this works is that the config specification is defined as:
>
>         config = new ConfigDef().define("bootstrap.brokers", Type.LIST,
> "documentation")
>
>                                 .define("metadata.timeout.ms", Type.LONG,
> 60 * 1000, atLeast(0), "documentation")
>                                 .define("max.partition.size", Type.INT,
> 16384, atLeast(0), "documentation")
>
>
> This is used in a ProducerConfig class which extends AbstractConfig to get
> access to some helper methods as well as the logic for tracking which
> configs get accessed.
>
> Currently I have included static String variables for each of the config
> names in that class. However I actually think that is not very helpful as
> the javadoc for them doesn't give the constant value and requires
> duplicating the documentation. To understand this point look at the javadoc
> and note that the doc on the string is not the same as what we define in
> the ConfigDef. We could just have the javadoc for the config string be the
> source of truth but it is actually pretty inconvient for that as it doesn't
> show you the value of the constant, just the variable name (unless you
> discover how to unhide it). That is fine for the clients, but for the
> server would be very weird especially for non-java people. We could attempt
> to duplicate documentation between the javadoc and the ConfigDef but given
> our struggle to get well-documented config in a single place this seems
> unwise.
>
> So I recommend we have a single source for documentation of these and that
> that source be the website documentation on configuration that covers
> clients and server and that that be generated off the config defs. The
> javadoc on KafkaProducer will link to this table so it should be quite
> convenient to discover. This makes things a little more typo prone, but
> that should be easily caught by the key detection. This will also make it
> possible for us to retire configs in the future without causing compile
> failures and add configs without having use of them break backwards
> compatibility. This is useful during upgrades where you want to be
> compatible with the old and new version so you can roll forwards and
> backwards.
>
> -Jay
>

Re: Config for new clients (and server)

Reply via email to