I would agree Jun
> On Feb 11, 2014, at 21:30, Jun Rao <jun...@gmail.com> wrote: > > I actually think this is useful for non-LinkedIn users as well. The > following is the tradeoff that I see. > > Most users probably won't care about seeing an extra 10-20 lines of INFO > level logging when starting up a client. However, it's very easy for users > to (1) mis-spell a config name (there was an issue in the mailing list just > a few days ago when a user mistyped "advertised.host" to "advertise.host") > or (2) inadvertently override a config value through the config systems. In > both cases, the INFO level logging will make it much easier for the user to > realize the human mistake. So, I think this is a case where the benefit > outweighs the disadvantage. > > Thanks, > > Jun > > >> On Mon, Feb 10, 2014 at 4:13 PM, Jay Kreps <jay.kr...@gmail.com> wrote: >> >> Yeah I am aware of how zookeeper behaves, I think it is kind of gross. >> >> I think logging it at DEBUG gets you what you want--by default we don't >> pollute logs, but anyone who wants to log this can enable DEBUG logging on >> org.apache.kafka.clients.producer.ProducerConfig. >> >> If we want this on by default at LinkedIn we can just set this logger to >> debug in our wrapper, we don't need to inflict this on everyone. >> >> The point is that spewing out each config IS a debug according to our >> definition: >> http://kafka.apache.org/coding-guide.html >> >> -Jay >> >> >>> On Mon, Feb 10, 2014 at 2:01 PM, Jun Rao <jun...@gmail.com> wrote: >>> >>> I actually prefer to see those at INFO level. The reason is that the >> config >>> system in an application can be complex. Some configs can be overridden >> in >>> different layers and it may not be easy to determine what the final >> binding >>> value is. The logging in Kafka will serve as the source of truth. >>> >>> For reference, ZK client logs all overridden values during >> initialization. >>> It's a one time thing during starting up, so shouldn't add much noise. >> It's >>> very useful for debugging subtle config issues. >>> >>> Exposing final configs programmatically is potentially useful. If we >> don't >>> want to log overridden values out of box, an app can achieve the same >> thing >>> using the programming api. The only missing thing is that we won't know >>> those unused property keys, which is probably less important than seeing >>> the overridden values. >>> >>> Thanks, >>> >>> Jun >>> >>> >>>> On Mon, Feb 10, 2014 at 10:15 AM, Jay Kreps <jay.kr...@gmail.com> wrote: >>>> >>>> Hey Jun, >>>> >>>> I think that is reasonable but would object to having it be debug >>> logging? >>>> I think logging out a bunch of noise during normal operation in a >> client >>>> library is pretty ugly. Also, is there value in exposing the final >>> configs >>>> programmatically? >>>> >>>> -Jay >>>> >>>> >>>> >>>>> On Sun, Feb 9, 2014 at 9:23 PM, Jun Rao <jun...@gmail.com> wrote: >>>>> >>>>> +1 on the new config. Just one comment. Currently, when initiating a >>>> config >>>>> (e.g. ProducerConfig), we log those overridden property values and >>> unused >>>>> property keys (likely due to mis-spelling). This has been very useful >>> for >>>>> config verification. It would be good to add similar support in the >> new >>>>> config. >>>>> >>>>> Thanks, >>>>> >>>>> Jun >>>>> >>>>> >>>>> On Tue, Feb 4, 2014 at 9:34 AM, Jay Kreps <jay.kr...@gmail.com> >> wrote: >>>>> >>>>>> We touched on this a bit in previous discussions, but I wanted to >>> draw >>>>> out >>>>>> the approach to config specifically as an item of discussion. >>>>>> >>>>>> The new producer and consumer use a similar key-value config >> approach >>>> as >>>>>> the existing scala clients but have different implementation code >> to >>>> help >>>>>> define these configs. The plan is to use the same approach on the >>>> server, >>>>>> once the new clients are complete; so if we agree on this approach >> it >>>>> will >>>>>> be the new default across the board. >>>>>> >>>>>> Let me split this into two parts. First I will try to motivate the >>> use >>>> of >>>>>> key-value pairs as a configuration api. Then let me discuss the >>>> mechanics >>>>>> of specifying and parsing these. If we agree on the public api then >>> the >>>>>> public api then the implementation details are interesting as this >>> will >>>>> be >>>>>> shared across producer, consumer, and broker and potentially some >>>> tools; >>>>>> but if we disagree about the api then there is no point in >> discussing >>>> the >>>>>> implementation. >>>>>> >>>>>> Let me explain the rationale for this. In a sense a key-value map >> of >>>>>> configs is the worst possible API to the programmer using the >>> clients. >>>>> Let >>>>>> me contrast the pros and cons versus a POJO and motivate why I >> think >>> it >>>>> is >>>>>> still superior overall. >>>>>> >>>>>> Pro: An application can externalize the configuration of its kafka >>>>> clients >>>>>> into its own configuration. Whatever config management system the >>>> client >>>>>> application is using will likely support key-value pairs, so the >>> client >>>>>> should be able to directly pull whatever configurations are present >>> and >>>>> use >>>>>> them in its client. This means that any configuration the client >>>> supports >>>>>> can be added to any application at runtime. With the pojo approach >>> the >>>>>> client application has to expose each pojo getter as some config >>>>> parameter. >>>>>> The result of many applications doing this is that the config is >>>>> different >>>>>> for each and it is very hard to have a standard client config >> shared >>>>>> across. Moving config into config files allows the usual tooling >>>> (version >>>>>> control, review, audit, config deployments separate from code >> pushes, >>>>>> etc.). >>>>>> >>>>>> Pro: Backwards and forwards compatibility. Provided we stick to our >>>> java >>>>>> api many internals can evolve and expose new configs. The >> application >>>> can >>>>>> support both the new and old client by just specifying a config >> that >>>> will >>>>>> be unused in the older version (and of course the reverse--we can >>>> remove >>>>>> obsolete configs). >>>>>> >>>>>> Pro: We can use a similar mechanism for both the client and the >>> server. >>>>>> Since most people run the server as a stand-alone process it needs >> a >>>>> config >>>>>> file. >>>>>> >>>>>> Pro: Systems like Samza that need to ship configs across the >> network >>>> can >>>>>> easily do so as configs have a natural serialized form. This can be >>>> done >>>>>> with pojos using java serialization but it is ugly and has bizare >>>> failure >>>>>> cases. >>>>>> >>>>>> Con: The IDE gives nice auto-completion for pojos. >>>>>> >>>>>> Con: There are some advantages to javadoc as a documentation >>> mechanism >>>>> for >>>>>> java people. >>>>>> >>>>>> Basically to me this is about operability versus niceness of api >> and >>> I >>>>>> think operability is more important. >>>>>> >>>>>> Let me now give some details of the config support classes in >>>>>> kafka.common.config and how they are intended to be used. >>>>>> >>>>>> The goal of this code is the following: >>>>>> 1. Make specifying configs, their expected type (string, numbers, >>>> lists, >>>>>> etc) simple and declarative >>>>>> 2. Allow for validating simple checks (numeric range checks, etc) >>>>>> 3. Make the config "self-documenting". I.e. we should be able to >>> write >>>>> code >>>>>> that generates the configuration documentation off the config def. >>>>>> 4. Specify default values. >>>>>> 5. Track which configs actually get used. >>>>>> 6. Make it easy to get config values. >>>>>> >>>>>> There are two classes there: ConfigDef and AbstractConfig. >> ConfigDef >>>>>> defines the specification of the accepted configurations and >>>>> AbstractConfig >>>>>> is a helper class for implementing the configuration class. The >>>>> difference >>>>>> is kind of like the difference between a "class" and an "object": >>>>> ConfigDef >>>>>> is for specifying the configurations that are accepted, >>> AbstractConfig >>>> is >>>>>> the base class for an instance of these configs. >>>>>> >>>>>> You can see this in action here: >>>>>> >>>>>> >>>>> >>>> >>> >> https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=blob_plain;f=clients/src/main/java/kafka/clients/producer/ProducerConfig.java;hb=HEAD >>>>>> >>>>>> (Ignore the static config names in there for now...I'm not actually >>>> sure >>>>>> that is the best approach). >>>>>> >>>>>> So the way this works is that the config specification is defined >> as: >>>>>> >>>>>> config = new ConfigDef().define("bootstrap.brokers", >>> Type.LIST, >>>>>> "documentation") >>>>>> >>>>>> .define("metadata.timeout.ms", >>>>> Type.LONG, >>>>>> 60 * 1000, atLeast(0), "documentation") >>>>>> .define("max.partition.size", >>> Type.INT, >>>>>> 16384, atLeast(0), "documentation") >>>>>> >>>>>> >>>>>> This is used in a ProducerConfig class which extends AbstractConfig >>> to >>>>> get >>>>>> access to some helper methods as well as the logic for tracking >> which >>>>>> configs get accessed. >>>>>> >>>>>> Currently I have included static String variables for each of the >>>> config >>>>>> names in that class. However I actually think that is not very >>> helpful >>>> as >>>>>> the javadoc for them doesn't give the constant value and requires >>>>>> duplicating the documentation. To understand this point look at the >>>>> javadoc >>>>>> and note that the doc on the string is not the same as what we >> define >>>> in >>>>>> the ConfigDef. We could just have the javadoc for the config string >>> be >>>>> the >>>>>> source of truth but it is actually pretty inconvient for that as it >>>>> doesn't >>>>>> show you the value of the constant, just the variable name (unless >>> you >>>>>> discover how to unhide it). That is fine for the clients, but for >> the >>>>>> server would be very weird especially for non-java people. We could >>>>> attempt >>>>>> to duplicate documentation between the javadoc and the ConfigDef >> but >>>>> given >>>>>> our struggle to get well-documented config in a single place this >>> seems >>>>>> unwise. >>>>>> >>>>>> So I recommend we have a single source for documentation of these >> and >>>>> that >>>>>> that source be the website documentation on configuration that >> covers >>>>>> clients and server and that that be generated off the config defs. >>> The >>>>>> javadoc on KafkaProducer will link to this table so it should be >>> quite >>>>>> convenient to discover. This makes things a little more typo prone, >>> but >>>>>> that should be easily caught by the key detection. This will also >>> make >>>> it >>>>>> possible for us to retire configs in the future without causing >>> compile >>>>>> failures and add configs without having use of them break backwards >>>>>> compatibility. This is useful during upgrades where you want to be >>>>>> compatible with the old and new version so you can roll forwards >> and >>>>>> backwards. >>>>>> >>>>>> -Jay >>>>>> >>>>> >>>> >>> >>