Jay,

Can we add another package (or two) to org.apache.kafka.common for metadata
and consensus.  We can call them something else but the idea would be to
have 1 common layer for meta data information (right now we put the json
into zookeeper) and 1 common layer for asynchronous watches (which we wait
for zookeeper to call us). It would be great to have that code something we
can wrap zkclient around (or currator) that can insulate the different
options growing in both of those areas.

Both the meta data code and async watches we would be able to run any class
we load in supporting the interface expected. The async watch interface can
have as an input to pass the loaded class a callback and when the watcher
fires (regardless if from etcd or zookeeper) the code gets the response it
expected and needed. We should also expose a function that returns a future
from the watcher.

This may cause a little more work also if we wanted to take the JSON and
turn that into byte structure ... or we just keep to the JSON and keep to
making it describable and self documenting?

For the meta data information I think that is separate because that data
right now (outside of kafka) already resides in other systems like
databases and/or caches. Folks may opt just to switch the meta data out to
reduce the burden on zookeeper to just doing the asynchronous watchers.
Some folks may want to swap both out.

These two layers could also just be 2-3 more files in utils.

- Joestein

On Sun, Feb 8, 2015 at 11:04 AM, Gwen Shapira <gshap...@cloudera.com> wrote:

> Thanks for the background.
>
> I picked the Network classes portion of it, since I was already looking at
> how to refactor send/receive and friends to support extending with TLS and
> SASL. Having to do this in just one place will be really nice :)
>
> Gwen
>
> On Sun, Feb 8, 2015 at 7:26 AM, Jay Kreps <jay.kr...@gmail.com> wrote:
>
> > Hey all,
> >
> > Someone asked about why there is code duplication between
> org.apache.common
> > and core. The answer seemed like it might be useful to others, so
> including
> > it here:
> >
> > Originally Kafka was more of a proof of concept and we didn't separate
> the
> > clients from the server. LinkedIn was much smaller and it wasn't open
> > source, and keeping those separate always adds a lot of overhead. So we
> > ended up with just one big jar.
> >
> > Next thing we know the kafka jar is embedded everywhere. Lot's of fallout
> > from that
> > - It has to be really sensitive to dependencies
> > - Scala causes all kinds of pain for users. Ironically it causes the most
> > pain for people using scala because of compatibility. I think the single
> > biggest Kafka complaint was the scala clients and resulting scary
> > exceptions, lack of javadoc, etc.
> > - Many of the client interfaces weren't well thought out as permanent
> > long-term commitments.
> > - We new we had to rewrite both clients due to technical deficiencies
> > anyway. The clients really needed to move to non-blocking I/O which is
> > basically a rewrite on it's own.
> >
> > So how to go about that?
> >
> > Well we felt we needed to maintain the old client interfaces for a good
> > period of time. Any kind of breaking cut-over was kind of a non-starter.
> > But a major refactoring in place was really hard since so many classes
> were
> > public and so little attention had been paid to the difference between
> > public and private classes.
> >
> > Naturally since the client and server do the inverse of each other there
> is
> > a ton of shared logic. So we thought we needed to break it up into three
> > independent chunks:
> > 1. common - shared helper code used by both clients and server
> > 2. clients - the producer, consumer, and eventually admin java
> interfaces.
> > This depends on common.
> > 3. server - the server (and legacy clients). This is currently called
> core.
> > This will depend on common and clients (because sometimes the server
> needs
> > to make client requests)
> >
> > Common and clients were left as a single jar and just logically separate
> so
> > that people wouldn't have to deal with two jars (and hence the
> possibility
> > of getting different versions of each).
> >
> > The dependency is actually a little counter-intuitive to people--they
> > usually think of the client as depending on the server since the client
> > calls the server. But in terms of code dependencies it is the other
> way--if
> > you depend on the client you obviously don't want to drag in the server.
> >
> > So to get all this done we decided to just go big and do a rewrite of the
> > clients in Java. A result of this is that any shared code would have to
> > move to Java (so the clients don't pull in Scala). We felt this was
> > probably a good thing in its own right as it gave a chance to improve a
> few
> > of these utility libraries like config parsing, etc.
> >
> > So the plan was and is:
> > 1. Rewrite producer, release and roll out
> > 2a. Rewrite consumer, release and roll out
> > 2b. Migrate server from scala code to org.apache.common classes
> > 3. Deprecate scala clients
> >
> > (2a) Is is in flight now, and that means (2b) is totally up for grabs. Of
> > these the request conversion is definitely the most pressing since having
> > those defined twice duplicates a ton of work. We will have to be
> > hyper-conscientious during the conversion about making the shared code in
> > common really solve the problem well and conveniently on the server as
> well
> > (so we don't end up just shoe-horning it in). My hope is that we can
> treat
> > this common code really well--it isn't as permanent as the public classes
> > but ends up heavily used so we should take good care of it. Most the
> shared
> > code is private so we can refactor the stuff in common to meet the needs
> of
> > the server if we find mismatches or missing functionality. I tried to
> keep
> > in mind the eventual server usage while writing it, but I doubt it will
> be
> > as trivial as just deleting the old and adding the new.
> >
> > In terms of the simplicity:
> > - Converting exceptions should be trivial
> > - Converting utils is straight-forward but we should evaluate the
> > individual utilities and see if they actually make sense, have tests, are
> > used, etc.
> > - Converting the requests may not be too complex but touches a huge hunk
> of
> > code and may require some effort to decouple the network layer.
> > - Converting the network code will be delicate and may require some
> changes
> > in org.apache.common.network to meet the server's needs
> >
> > This is all a lot of work, but if we stick to it at the end we will have
> > really nice clients and a nice modular code base. :-)
> >
> > Cheers,
> >
> > -Jay
> >
>

Reply via email to