Hi all,

I started a proposal of contributing a table (aka key/value) service
component as a contrib module to the bookkeeper community. This BP together
with other BPs I sent last week forms the idea of how we can do on
improving metadata management in bookkeeper (I will talk a bit more at the
end of this email).

**why it was developed**

Two main categories of use cases were driving the need of a key/value like
service.

One is metadata storage, bookkeeper needs a key/value like storage
(currently it is zookeeper) to store the ledger's metadata, systems built
on top of bookkeeper like distributedlog/pulsar also follow the pattern
that bookkeeper is using. They all need a key/value like storage to store
their metadata. We all know zookeeper is the bottleneck of the scalability.
And it is also an issue marker to production systems (based on my biased
production experiences).

The other one is state storage in real-time/streaming
analytics/computation. In streaming analytics, the computation jobs usually
process streaming data. they usually need to store some sort of state of
the computation operators into a storage and serve the computation state as
final results for queries. Those state are usually represented in key/value
forms, and usually backed by wal. BookKeeper has been used in this area via
distributedlog/pulsar for storing and serving log / streaming data. It is
ideal for bookkeeper also able to store and serve state data for the sake
of unification, simplification and also reducing the complexity of
deployment and operations.

Hence we prototyped/developed a table service component as an add-on to
bookkeeper. We'd like to contribute this as a contrib module to bookkeeper
and continue the development, integration and evaluation in the bookkeeper
community.

We hope this can be like bookkeeper in zookeeper. bookkeeper was a contrib
module in zookeeper, and it is developed in the community and grown into
what it is now.

**how it is aligned with metadata storage**

BP-28, BP-29 and BP-30. They are related at some extend.

BP-28 is more a cleanup proposal to carry-on Jia's work (on service
discovery interfaces). This is to produce a clean metadata api module,
define a clean dependency between
bookkeeper implementation and metadata service, and allow we really plugin
different
metadata services without touching/changing bookkeeper implementation.

BP-29 and BP-30 can be thought as two different metadata service
implementation based
on the metadata api contract defined in BP-28.

BP-29 is to use Etcd as the metadata service, while BP-30 is to have a
built-in key/value service as the metadata service. Both BP-29 and BP-30
have pros and cons. However they
are not against to each other. Allowing two concurrent approaches will help
us understand
more on metadata management in bookkeeper and its ecosystem (e.g. dlog,
pulsar), which
will lead the project head in a healthy direction.

**Proposed Changes**

This proposal is to propose this table service as a contrib module under
`stream` directory just as how we handle `dlog`. We can mark it as
"preview"/"alpha" in 4.7 and continue the development of this module in
bookkeeper community.

The details of the proposal can be found in the google doc attached below:

https://docs.google.com/document/d/155xAwWv5IdOitHh1NVMEwCMGgB28M3FyMiQSxEpjE-Y/edit#heading=h.56rbh52koe3f

Please take a look. Comments are welcome.

- Sijie

Reply via email to