On Tue, Feb 13, 2018 at 6:35 AM Enrico Olivelli <eolive...@gmail.com> wrote:
> (Maybe it is better to comment on google doc but these are very high level > questions) > > Some questions: > 1) I see we initially still need zookeeper, it would be interesting to know > if you want to drop it completely in the future. Certainly the usage of zk > in this case will be very limited because it will only have to support > Helix This BP is not directly removing zookeeper. This BP is more about provide a key value service, which it can be used later for storing user ledgers metadata. It then can reduce the amount of metadata in zookeeper to some system ledgers. We will have a separate BP to address the metadata bootstrap problem. So that says we will have to have a key/value service in place before we talk about removing zookeeper. That is the intention to have key/value first. > > 2) I see we are going to use DL for checkpoints, which maybe in turn will > need this system (as one motivation is the support of DL metadata), it > seems to me some kind of circular dependency. Can you explain how to > implement this? Using DL is because of it’s namespacing and reopenable feature. It is similar as bookkeeper ledger metadata itself. You will have some system dlog still use zookeeper, but they will address by metadata bootstrap in a separate BP once keyvalue is mature (as mentioned in 1). > > 3) I see that checkpoints will be done by copying raw rocksdb files, I have > no experience of RocksDB, is it safe to directly copy files and obtain a > consistent snapshot ? https://github.com/facebook/rocksdb/wiki/Checkpoints > > 4) if BookKeeper need this service for metada and checkpoints are written > to BK itself, how can the system boot? Surely I am missing one piece See the comment at 1) > > Great work > > Enrico > > Il lun 12 feb 2018, 02:07 Sijie Guo <guosi...@gmail.com> ha scritto: > > > Thanks JV and Encrico. > > > > I would like to include this as a contrib in bookkeeper for 4.7 like > > bookkeeper was grown from a contrib in zookeeper before. > > > > So if the idea sounds good to you guys, and if you guys think this is > > aligned with bookkeeper roadmap, let’s try to move this forward with a > > contrib module in bookkeeper and continue the development in bookkeeper. > > > > If there is no major concerns, I would like to call a vote for this week. > > > > Sijie > > > > > > On Thu, Feb 8, 2018 at 12:01 AM Venkateswara Rao Jujjuri < > > jujj...@gmail.com> > > wrote: > > > > > A great step to move forward. BP-29 and BP-30 along with reorganizing > ZK > > > will help the BK to shape perfect MDS abstraction. > > > While BP-30 is ambitious, it is a perfect way to start ambitious > > projects. > > > :) > > > > > > JV > > > > > > On Wed, Feb 7, 2018 at 6:49 AM, Enrico Olivelli <eolive...@gmail.com> > > > wrote: > > > > > > > It is very interesting! Thank you. > > > > I will look into it soon > > > > > > > > Enrico > > > > > > > > Il mer 7 feb 2018, 15:24 Sijie Guo <guosi...@gmail.com> ha scritto: > > > > > > > > > Hi all, > > > > > > > > > > I started a proposal of contributing a table (aka key/value) > service > > > > > component as a contrib module to the bookkeeper community. This BP > > > > together > > > > > with other BPs I sent last week forms the idea of how we can do on > > > > > improving metadata management in bookkeeper (I will talk a bit more > > at > > > > the > > > > > end of this email). > > > > > > > > > > **why it was developed** > > > > > > > > > > Two main categories of use cases were driving the need of a > key/value > > > > like > > > > > service. > > > > > > > > > > One is metadata storage, bookkeeper needs a key/value like storage > > > > > (currently it is zookeeper) to store the ledger's metadata, systems > > > built > > > > > on top of bookkeeper like distributedlog/pulsar also follow the > > pattern > > > > > that bookkeeper is using. They all need a key/value like storage to > > > store > > > > > their metadata. We all know zookeeper is the bottleneck of the > > > > scalability. > > > > > And it is also an issue marker to production systems (based on my > > > biased > > > > > production experiences). > > > > > > > > > > The other one is state storage in real-time/streaming > > > > > analytics/computation. In streaming analytics, the computation jobs > > > > usually > > > > > process streaming data. they usually need to store some sort of > state > > > of > > > > > the computation operators into a storage and serve the computation > > > state > > > > as > > > > > final results for queries. Those state are usually represented in > > > > key/value > > > > > forms, and usually backed by wal. BookKeeper has been used in this > > area > > > > via > > > > > distributedlog/pulsar for storing and serving log / streaming data. > > It > > > is > > > > > ideal for bookkeeper also able to store and serve state data for > the > > > sake > > > > > of unification, simplification and also reducing the complexity of > > > > > deployment and operations. > > > > > > > > > > Hence we prototyped/developed a table service component as an > add-on > > to > > > > > bookkeeper. We'd like to contribute this as a contrib module to > > > > bookkeeper > > > > > and continue the development, integration and evaluation in the > > > > bookkeeper > > > > > community. > > > > > > > > > > We hope this can be like bookkeeper in zookeeper. bookkeeper was a > > > > contrib > > > > > module in zookeeper, and it is developed in the community and grown > > > into > > > > > what it is now. > > > > > > > > > > **how it is aligned with metadata storage** > > > > > > > > > > BP-28, BP-29 and BP-30. They are related at some extend. > > > > > > > > > > BP-28 is more a cleanup proposal to carry-on Jia's work (on service > > > > > discovery interfaces). This is to produce a clean metadata api > > module, > > > > > define a clean dependency between > > > > > bookkeeper implementation and metadata service, and allow we really > > > > plugin > > > > > different > > > > > metadata services without touching/changing bookkeeper > > implementation. > > > > > > > > > > BP-29 and BP-30 can be thought as two different metadata service > > > > > implementation based > > > > > on the metadata api contract defined in BP-28. > > > > > > > > > > BP-29 is to use Etcd as the metadata service, while BP-30 is to > have > > a > > > > > built-in key/value service as the metadata service. Both BP-29 and > > > BP-30 > > > > > have pros and cons. However they > > > > > are not against to each other. Allowing two concurrent approaches > > will > > > > help > > > > > us understand > > > > > more on metadata management in bookkeeper and its ecosystem (e.g. > > dlog, > > > > > pulsar), which > > > > > will lead the project head in a healthy direction. > > > > > > > > > > **Proposed Changes** > > > > > > > > > > This proposal is to propose this table service as a contrib module > > > under > > > > > `stream` directory just as how we handle `dlog`. We can mark it as > > > > > "preview"/"alpha" in 4.7 and continue the development of this > module > > in > > > > > bookkeeper community. > > > > > > > > > > The details of the proposal can be found in the google doc attached > > > > below: > > > > > > > > > > > > > > > https://docs.google.com/document/d/155xAwWv5IdOitHh1NVMEwCMGgB28M > > > > 3FyMiQSxEpjE-Y/edit#heading=h.56rbh52koe3f > > > > > > > > > > Please take a look. Comments are welcome. > > > > > > > > > > - Sijie > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > > -- Enrico Olivelli > > > > > > > > > > > > > > > > -- > > > Jvrao > > > --- > > > First they ignore you, then they laugh at you, then they fight you, > then > > > you win. - Mahatma Gandhi > > > > > > > > -- > > > -- Enrico Olivelli >