Re: Config for new clients (and server)

Jay Kreps Wed, 05 Feb 2014 15:47:45 -0800

And you guys like the existing helper code for that?

-Jay



On Wed, Feb 5, 2014 at 10:17 AM, Neha Narkhede <neha.narkh...@gmail.com>wrote:

> I'm not so sure about the static config names used in the producer, but I'm
> +1 on using the key value approach for configs to ease operability.
>
> Thanks,
> Neha
>
>
> On Wed, Feb 5, 2014 at 10:10 AM, Guozhang Wang <wangg...@gmail.com> wrote:
>
> > +1 for the key-value approach.
> >
> > Guozhang
> >
> >
> > On Tue, Feb 4, 2014 at 9:34 AM, Jay Kreps <jay.kr...@gmail.com> wrote:
> >
> > > We touched on this a bit in previous discussions, but I wanted to draw
> > out
> > > the approach to config specifically as an item of discussion.
> > >
> > > The new producer and consumer use a similar key-value config approach
> as
> > > the existing scala clients but have different implementation code to
> help
> > > define these configs. The plan is to use the same approach on the
> server,
> > > once the new clients are complete; so if we agree on this approach it
> > will
> > > be the new default across the board.
> > >
> > > Let me split this into two parts. First I will try to motivate the use
> of
> > > key-value pairs as a configuration api. Then let me discuss the
> mechanics
> > > of specifying and parsing these. If we agree on the public api then the
> > > public api then the implementation details are interesting as this will
> > be
> > > shared across producer, consumer, and broker and potentially some
> tools;
> > > but if we disagree about the api then there is no point in discussing
> the
> > > implementation.
> > >
> > > Let me explain the rationale for this. In a sense a key-value map of
> > > configs is the worst possible API to the programmer using the clients.
> > Let
> > > me contrast the pros and cons versus a POJO and motivate why I think it
> > is
> > > still superior overall.
> > >
> > > Pro: An application can externalize the configuration of its kafka
> > clients
> > > into its own configuration. Whatever config management system the
> client
> > > application is using will likely support key-value pairs, so the client
> > > should be able to directly pull whatever configurations are present and
> > use
> > > them in its client. This means that any configuration the client
> supports
> > > can be added to any application at runtime. With the pojo approach the
> > > client application has to expose each pojo getter as some config
> > parameter.
> > > The result of many applications doing this is that the config is
> > different
> > > for each and it is very hard to have a standard client config shared
> > > across. Moving config into config files allows the usual tooling
> (version
> > > control, review, audit, config deployments separate from code pushes,
> > > etc.).
> > >
> > > Pro: Backwards and forwards compatibility. Provided we stick to our
> java
> > > api many internals can evolve and expose new configs. The application
> can
> > > support both the new and old client by just specifying a config that
> will
> > > be unused in the older version (and of course the reverse--we can
> remove
> > > obsolete configs).
> > >
> > > Pro: We can use a similar mechanism for both the client and the server.
> > > Since most people run the server as a stand-alone process it needs a
> > config
> > > file.
> > >
> > > Pro: Systems like Samza that need to ship configs across the network
> can
> > > easily do so as configs have a natural serialized form. This can be
> done
> > > with pojos using java serialization but it is ugly and has bizare
> failure
> > > cases.
> > >
> > > Con: The IDE gives nice auto-completion for pojos.
> > >
> > > Con: There are some advantages to javadoc as a documentation mechanism
> > for
> > > java people.
> > >
> > > Basically to me this is about operability versus niceness of api and I
> > > think operability is more important.
> > >
> > > Let me now give some details of the config support classes in
> > > kafka.common.config and how they are intended to be used.
> > >
> > > The goal of this code is the following:
> > > 1. Make specifying configs, their expected type (string, numbers,
> lists,
> > > etc) simple and declarative
> > > 2. Allow for validating simple checks (numeric range checks, etc)
> > > 3. Make the config "self-documenting". I.e. we should be able to write
> > code
> > > that generates the configuration documentation off the config def.
> > > 4. Specify default values.
> > > 5. Track which configs actually get used.
> > > 6. Make it easy to get config values.
> > >
> > > There are two classes there: ConfigDef and AbstractConfig. ConfigDef
> > > defines the specification of the accepted configurations and
> > AbstractConfig
> > > is a helper class for implementing the configuration class. The
> > difference
> > > is kind of like the difference between a "class" and an "object":
> > ConfigDef
> > > is for specifying the configurations that are accepted, AbstractConfig
> is
> > > the base class for an instance of these configs.
> > >
> > > You can see this in action here:
> > >
> > >
> >
> https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=blob_plain;f=clients/src/main/java/kafka/clients/producer/ProducerConfig.java;hb=HEAD
> > >
> > > (Ignore the static config names in there for now...I'm not actually
> sure
> > > that is the best approach).
> > >
> > > So the way this works is that the config specification is defined as:
> > >
> > >         config = new ConfigDef().define("bootstrap.brokers", Type.LIST,
> > > "documentation")
> > >
> > >                                 .define("metadata.timeout.ms",
> > Type.LONG,
> > > 60 * 1000, atLeast(0), "documentation")
> > >                                 .define("max.partition.size", Type.INT,
> > > 16384, atLeast(0), "documentation")
> > >
> > >
> > > This is used in a ProducerConfig class which extends AbstractConfig to
> > get
> > > access to some helper methods as well as the logic for tracking which
> > > configs get accessed.
> > >
> > > Currently I have included static String variables for each of the
> config
> > > names in that class. However I actually think that is not very helpful
> as
> > > the javadoc for them doesn't give the constant value and requires
> > > duplicating the documentation. To understand this point look at the
> > javadoc
> > > and note that the doc on the string is not the same as what we define
> in
> > > the ConfigDef. We could just have the javadoc for the config string be
> > the
> > > source of truth but it is actually pretty inconvient for that as it
> > doesn't
> > > show you the value of the constant, just the variable name (unless you
> > > discover how to unhide it). That is fine for the clients, but for the
> > > server would be very weird especially for non-java people. We could
> > attempt
> > > to duplicate documentation between the javadoc and the ConfigDef but
> > given
> > > our struggle to get well-documented config in a single place this seems
> > > unwise.
> > >
> > > So I recommend we have a single source for documentation of these and
> > that
> > > that source be the website documentation on configuration that covers
> > > clients and server and that that be generated off the config defs. The
> > > javadoc on KafkaProducer will link to this table so it should be quite
> > > convenient to discover. This makes things a little more typo prone, but
> > > that should be easily caught by the key detection. This will also make
> it
> > > possible for us to retire configs in the future without causing compile
> > > failures and add configs without having use of them break backwards
> > > compatibility. This is useful during upgrades where you want to be
> > > compatible with the old and new version so you can roll forwards and
> > > backwards.
> > >
> > > -Jay
> > >
> >
> >
> >
> > --
> > -- Guozhang
> >
>

Re: Config for new clients (and server)

Reply via email to