Re: Config for new clients (and server)

Steve Morin Tue, 11 Feb 2014 21:37:37 -0800

I would agree Jun


> On Feb 11, 2014, at 21:30, Jun Rao <jun...@gmail.com> wrote:
> 
> I actually think this is useful for non-LinkedIn users as well. The
> following is the tradeoff that I see.
> 
> Most users probably won't care about seeing an extra 10-20 lines of INFO
> level logging when starting up a client. However, it's very easy for users
> to (1) mis-spell a config name (there was an issue in the mailing list just
> a few days ago when a user mistyped "advertised.host" to "advertise.host")
> or (2) inadvertently override a config value through the config systems. In
> both cases, the INFO level logging will make it much easier for the user to
> realize the human mistake. So, I think this is a case where the benefit
> outweighs the disadvantage.
> 
> Thanks,
> 
> Jun
> 
> 
>> On Mon, Feb 10, 2014 at 4:13 PM, Jay Kreps <jay.kr...@gmail.com> wrote:
>> 
>> Yeah I am aware of how zookeeper behaves, I think it is kind of gross.
>> 
>> I think logging it at DEBUG gets you what you want--by default we don't
>> pollute logs, but anyone who wants to log this can enable DEBUG logging on
>> org.apache.kafka.clients.producer.ProducerConfig.
>> 
>> If we want this on by default at LinkedIn we can just set this logger to
>> debug in our wrapper, we don't need to inflict this on everyone.
>> 
>> The point is that spewing out each config IS a debug according to our
>> definition:
>>  http://kafka.apache.org/coding-guide.html
>> 
>> -Jay
>> 
>> 
>>> On Mon, Feb 10, 2014 at 2:01 PM, Jun Rao <jun...@gmail.com> wrote:
>>> 
>>> I actually prefer to see those at INFO level. The reason is that the
>> config
>>> system in an application can be complex. Some configs can be overridden
>> in
>>> different layers and it may not be easy to determine what the final
>> binding
>>> value is. The logging in Kafka will serve as the source of truth.
>>> 
>>> For reference, ZK client logs all overridden values during
>> initialization.
>>> It's a one time thing during starting up, so shouldn't add much noise.
>> It's
>>> very useful for debugging subtle config issues.
>>> 
>>> Exposing final configs programmatically is potentially useful. If we
>> don't
>>> want to log overridden values out of box, an app can achieve the same
>> thing
>>> using the programming api. The only missing thing is that we won't know
>>> those unused property keys, which is probably less important than seeing
>>> the overridden values.
>>> 
>>> Thanks,
>>> 
>>> Jun
>>> 
>>> 
>>>> On Mon, Feb 10, 2014 at 10:15 AM, Jay Kreps <jay.kr...@gmail.com> wrote:
>>>> 
>>>> Hey Jun,
>>>> 
>>>> I think that is reasonable but would object to having it be debug
>>> logging?
>>>> I think logging out a bunch of noise during normal operation in a
>> client
>>>> library is pretty ugly. Also, is there value in exposing the final
>>> configs
>>>> programmatically?
>>>> 
>>>> -Jay
>>>> 
>>>> 
>>>> 
>>>>> On Sun, Feb 9, 2014 at 9:23 PM, Jun Rao <jun...@gmail.com> wrote:
>>>>> 
>>>>> +1 on the new config. Just one comment. Currently, when initiating a
>>>> config
>>>>> (e.g. ProducerConfig), we log those overridden property values and
>>> unused
>>>>> property keys (likely due to mis-spelling). This has been very useful
>>> for
>>>>> config verification. It would be good to add similar support in the
>> new
>>>>> config.
>>>>> 
>>>>> Thanks,
>>>>> 
>>>>> Jun
>>>>> 
>>>>> 
>>>>> On Tue, Feb 4, 2014 at 9:34 AM, Jay Kreps <jay.kr...@gmail.com>
>> wrote:
>>>>> 
>>>>>> We touched on this a bit in previous discussions, but I wanted to
>>> draw
>>>>> out
>>>>>> the approach to config specifically as an item of discussion.
>>>>>> 
>>>>>> The new producer and consumer use a similar key-value config
>> approach
>>>> as
>>>>>> the existing scala clients but have different implementation code
>> to
>>>> help
>>>>>> define these configs. The plan is to use the same approach on the
>>>> server,
>>>>>> once the new clients are complete; so if we agree on this approach
>> it
>>>>> will
>>>>>> be the new default across the board.
>>>>>> 
>>>>>> Let me split this into two parts. First I will try to motivate the
>>> use
>>>> of
>>>>>> key-value pairs as a configuration api. Then let me discuss the
>>>> mechanics
>>>>>> of specifying and parsing these. If we agree on the public api then
>>> the
>>>>>> public api then the implementation details are interesting as this
>>> will
>>>>> be
>>>>>> shared across producer, consumer, and broker and potentially some
>>>> tools;
>>>>>> but if we disagree about the api then there is no point in
>> discussing
>>>> the
>>>>>> implementation.
>>>>>> 
>>>>>> Let me explain the rationale for this. In a sense a key-value map
>> of
>>>>>> configs is the worst possible API to the programmer using the
>>> clients.
>>>>> Let
>>>>>> me contrast the pros and cons versus a POJO and motivate why I
>> think
>>> it
>>>>> is
>>>>>> still superior overall.
>>>>>> 
>>>>>> Pro: An application can externalize the configuration of its kafka
>>>>> clients
>>>>>> into its own configuration. Whatever config management system the
>>>> client
>>>>>> application is using will likely support key-value pairs, so the
>>> client
>>>>>> should be able to directly pull whatever configurations are present
>>> and
>>>>> use
>>>>>> them in its client. This means that any configuration the client
>>>> supports
>>>>>> can be added to any application at runtime. With the pojo approach
>>> the
>>>>>> client application has to expose each pojo getter as some config
>>>>> parameter.
>>>>>> The result of many applications doing this is that the config is
>>>>> different
>>>>>> for each and it is very hard to have a standard client config
>> shared
>>>>>> across. Moving config into config files allows the usual tooling
>>>> (version
>>>>>> control, review, audit, config deployments separate from code
>> pushes,
>>>>>> etc.).
>>>>>> 
>>>>>> Pro: Backwards and forwards compatibility. Provided we stick to our
>>>> java
>>>>>> api many internals can evolve and expose new configs. The
>> application
>>>> can
>>>>>> support both the new and old client by just specifying a config
>> that
>>>> will
>>>>>> be unused in the older version (and of course the reverse--we can
>>>> remove
>>>>>> obsolete configs).
>>>>>> 
>>>>>> Pro: We can use a similar mechanism for both the client and the
>>> server.
>>>>>> Since most people run the server as a stand-alone process it needs
>> a
>>>>> config
>>>>>> file.
>>>>>> 
>>>>>> Pro: Systems like Samza that need to ship configs across the
>> network
>>>> can
>>>>>> easily do so as configs have a natural serialized form. This can be
>>>> done
>>>>>> with pojos using java serialization but it is ugly and has bizare
>>>> failure
>>>>>> cases.
>>>>>> 
>>>>>> Con: The IDE gives nice auto-completion for pojos.
>>>>>> 
>>>>>> Con: There are some advantages to javadoc as a documentation
>>> mechanism
>>>>> for
>>>>>> java people.
>>>>>> 
>>>>>> Basically to me this is about operability versus niceness of api
>> and
>>> I
>>>>>> think operability is more important.
>>>>>> 
>>>>>> Let me now give some details of the config support classes in
>>>>>> kafka.common.config and how they are intended to be used.
>>>>>> 
>>>>>> The goal of this code is the following:
>>>>>> 1. Make specifying configs, their expected type (string, numbers,
>>>> lists,
>>>>>> etc) simple and declarative
>>>>>> 2. Allow for validating simple checks (numeric range checks, etc)
>>>>>> 3. Make the config "self-documenting". I.e. we should be able to
>>> write
>>>>> code
>>>>>> that generates the configuration documentation off the config def.
>>>>>> 4. Specify default values.
>>>>>> 5. Track which configs actually get used.
>>>>>> 6. Make it easy to get config values.
>>>>>> 
>>>>>> There are two classes there: ConfigDef and AbstractConfig.
>> ConfigDef
>>>>>> defines the specification of the accepted configurations and
>>>>> AbstractConfig
>>>>>> is a helper class for implementing the configuration class. The
>>>>> difference
>>>>>> is kind of like the difference between a "class" and an "object":
>>>>> ConfigDef
>>>>>> is for specifying the configurations that are accepted,
>>> AbstractConfig
>>>> is
>>>>>> the base class for an instance of these configs.
>>>>>> 
>>>>>> You can see this in action here:
>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=blob_plain;f=clients/src/main/java/kafka/clients/producer/ProducerConfig.java;hb=HEAD
>>>>>> 
>>>>>> (Ignore the static config names in there for now...I'm not actually
>>>> sure
>>>>>> that is the best approach).
>>>>>> 
>>>>>> So the way this works is that the config specification is defined
>> as:
>>>>>> 
>>>>>>        config = new ConfigDef().define("bootstrap.brokers",
>>> Type.LIST,
>>>>>> "documentation")
>>>>>> 
>>>>>>                                .define("metadata.timeout.ms",
>>>>> Type.LONG,
>>>>>> 60 * 1000, atLeast(0), "documentation")
>>>>>>                                .define("max.partition.size",
>>> Type.INT,
>>>>>> 16384, atLeast(0), "documentation")
>>>>>> 
>>>>>> 
>>>>>> This is used in a ProducerConfig class which extends AbstractConfig
>>> to
>>>>> get
>>>>>> access to some helper methods as well as the logic for tracking
>> which
>>>>>> configs get accessed.
>>>>>> 
>>>>>> Currently I have included static String variables for each of the
>>>> config
>>>>>> names in that class. However I actually think that is not very
>>> helpful
>>>> as
>>>>>> the javadoc for them doesn't give the constant value and requires
>>>>>> duplicating the documentation. To understand this point look at the
>>>>> javadoc
>>>>>> and note that the doc on the string is not the same as what we
>> define
>>>> in
>>>>>> the ConfigDef. We could just have the javadoc for the config string
>>> be
>>>>> the
>>>>>> source of truth but it is actually pretty inconvient for that as it
>>>>> doesn't
>>>>>> show you the value of the constant, just the variable name (unless
>>> you
>>>>>> discover how to unhide it). That is fine for the clients, but for
>> the
>>>>>> server would be very weird especially for non-java people. We could
>>>>> attempt
>>>>>> to duplicate documentation between the javadoc and the ConfigDef
>> but
>>>>> given
>>>>>> our struggle to get well-documented config in a single place this
>>> seems
>>>>>> unwise.
>>>>>> 
>>>>>> So I recommend we have a single source for documentation of these
>> and
>>>>> that
>>>>>> that source be the website documentation on configuration that
>> covers
>>>>>> clients and server and that that be generated off the config defs.
>>> The
>>>>>> javadoc on KafkaProducer will link to this table so it should be
>>> quite
>>>>>> convenient to discover. This makes things a little more typo prone,
>>> but
>>>>>> that should be easily caught by the key detection. This will also
>>> make
>>>> it
>>>>>> possible for us to retire configs in the future without causing
>>> compile
>>>>>> failures and add configs without having use of them break backwards
>>>>>> compatibility. This is useful during upgrades where you want to be
>>>>>> compatible with the old and new version so you can roll forwards
>> and
>>>>>> backwards.
>>>>>> 
>>>>>> -Jay
>>>>>> 
>>>>> 
>>>> 
>>> 
>>

Re: Config for new clients (and server)

Reply via email to