On Tue, Feb 13, 2018 at 2:35 PM Enrico Olivelli <eolive...@gmail.com> wrote:
> All clear Sijie, > IMHO many API function names do not follow usual Java naming conventions, I > know this is a large code contribution, but do you think it will be > possible to discuss about 'names' before releasing first version? Which one doesn’t follow java naming convention? It should follow bookkeeper convention since we are making it part of bk. > > > Last question: this will be a contrib module, are we going to release it > with regular releases of BK ? Yes it will be released with regular release. We will keep it as contrib or previews for a few releases before claiming it as mature. > > Thanks > Enrico > > Il mar 13 feb 2018, 00:47 Sijie Guo <guosi...@gmail.com> ha scritto: > > > On Tue, Feb 13, 2018 at 6:35 AM Enrico Olivelli <eolive...@gmail.com> > > wrote: > > > > > (Maybe it is better to comment on google doc but these are very high > > level > > > questions) > > > > > > Some questions: > > > 1) I see we initially still need zookeeper, it would be interesting to > > know > > > if you want to drop it completely in the future. Certainly the usage of > > zk > > > in this case will be very limited because it will only have to support > > > Helix > > > > > > This BP is not directly removing zookeeper. This BP is more about > provide a > > key value service, which it can be used later for storing user ledgers > > metadata. It then can reduce the amount of metadata in zookeeper to some > > system ledgers. We will have a separate BP to address the metadata > > bootstrap problem. > > > > So that says we will have to have a key/value service in place before we > > talk about removing zookeeper. That is the intention to have key/value > > first. > > > > > > > > > > > > 2) I see we are going to use DL for checkpoints, which maybe in turn > will > > > need this system (as one motivation is the support of DL metadata), it > > > seems to me some kind of circular dependency. Can you explain how to > > > implement this? > > > > > > Using DL is because of it’s namespacing and reopenable feature. It is > > similar as bookkeeper ledger metadata itself. You will have some system > > dlog still use zookeeper, but they will address by metadata bootstrap in > a > > separate BP once keyvalue is mature (as mentioned in 1). > > > > > > > > > > 3) I see that checkpoints will be done by copying raw rocksdb files, I > > have > > > no experience of RocksDB, is it safe to directly copy files and obtain > a > > > consistent snapshot ? > > > > https://github.com/facebook/rocksdb/wiki/Checkpoints > > > > > > > > > > > > 4) if BookKeeper need this service for metada and checkpoints are > written > > > to BK itself, how can the system boot? Surely I am missing one piece > > > > > > See the comment at 1) > > > > > > > > > > Great work > > > > > > Enrico > > > > > > Il lun 12 feb 2018, 02:07 Sijie Guo <guosi...@gmail.com> ha scritto: > > > > > > > Thanks JV and Encrico. > > > > > > > > I would like to include this as a contrib in bookkeeper for 4.7 like > > > > bookkeeper was grown from a contrib in zookeeper before. > > > > > > > > So if the idea sounds good to you guys, and if you guys think this is > > > > aligned with bookkeeper roadmap, let’s try to move this forward with > a > > > > contrib module in bookkeeper and continue the development in > > bookkeeper. > > > > > > > > If there is no major concerns, I would like to call a vote for this > > week. > > > > > > > > Sijie > > > > > > > > > > > > On Thu, Feb 8, 2018 at 12:01 AM Venkateswara Rao Jujjuri < > > > > jujj...@gmail.com> > > > > wrote: > > > > > > > > > A great step to move forward. BP-29 and BP-30 along with > reorganizing > > > ZK > > > > > will help the BK to shape perfect MDS abstraction. > > > > > While BP-30 is ambitious, it is a perfect way to start ambitious > > > > projects. > > > > > :) > > > > > > > > > > JV > > > > > > > > > > On Wed, Feb 7, 2018 at 6:49 AM, Enrico Olivelli < > eolive...@gmail.com > > > > > > > > wrote: > > > > > > > > > > > It is very interesting! Thank you. > > > > > > I will look into it soon > > > > > > > > > > > > Enrico > > > > > > > > > > > > Il mer 7 feb 2018, 15:24 Sijie Guo <guosi...@gmail.com> ha > > scritto: > > > > > > > > > > > > > Hi all, > > > > > > > > > > > > > > I started a proposal of contributing a table (aka key/value) > > > service > > > > > > > component as a contrib module to the bookkeeper community. This > > BP > > > > > > together > > > > > > > with other BPs I sent last week forms the idea of how we can do > > on > > > > > > > improving metadata management in bookkeeper (I will talk a bit > > more > > > > at > > > > > > the > > > > > > > end of this email). > > > > > > > > > > > > > > **why it was developed** > > > > > > > > > > > > > > Two main categories of use cases were driving the need of a > > > key/value > > > > > > like > > > > > > > service. > > > > > > > > > > > > > > One is metadata storage, bookkeeper needs a key/value like > > storage > > > > > > > (currently it is zookeeper) to store the ledger's metadata, > > systems > > > > > built > > > > > > > on top of bookkeeper like distributedlog/pulsar also follow the > > > > pattern > > > > > > > that bookkeeper is using. They all need a key/value like > storage > > to > > > > > store > > > > > > > their metadata. We all know zookeeper is the bottleneck of the > > > > > > scalability. > > > > > > > And it is also an issue marker to production systems (based on > my > > > > > biased > > > > > > > production experiences). > > > > > > > > > > > > > > The other one is state storage in real-time/streaming > > > > > > > analytics/computation. In streaming analytics, the computation > > jobs > > > > > > usually > > > > > > > process streaming data. they usually need to store some sort of > > > state > > > > > of > > > > > > > the computation operators into a storage and serve the > > computation > > > > > state > > > > > > as > > > > > > > final results for queries. Those state are usually represented > in > > > > > > key/value > > > > > > > forms, and usually backed by wal. BookKeeper has been used in > > this > > > > area > > > > > > via > > > > > > > distributedlog/pulsar for storing and serving log / streaming > > data. > > > > It > > > > > is > > > > > > > ideal for bookkeeper also able to store and serve state data > for > > > the > > > > > sake > > > > > > > of unification, simplification and also reducing the complexity > > of > > > > > > > deployment and operations. > > > > > > > > > > > > > > Hence we prototyped/developed a table service component as an > > > add-on > > > > to > > > > > > > bookkeeper. We'd like to contribute this as a contrib module to > > > > > > bookkeeper > > > > > > > and continue the development, integration and evaluation in the > > > > > > bookkeeper > > > > > > > community. > > > > > > > > > > > > > > We hope this can be like bookkeeper in zookeeper. bookkeeper > was > > a > > > > > > contrib > > > > > > > module in zookeeper, and it is developed in the community and > > grown > > > > > into > > > > > > > what it is now. > > > > > > > > > > > > > > **how it is aligned with metadata storage** > > > > > > > > > > > > > > BP-28, BP-29 and BP-30. They are related at some extend. > > > > > > > > > > > > > > BP-28 is more a cleanup proposal to carry-on Jia's work (on > > service > > > > > > > discovery interfaces). This is to produce a clean metadata api > > > > module, > > > > > > > define a clean dependency between > > > > > > > bookkeeper implementation and metadata service, and allow we > > really > > > > > > plugin > > > > > > > different > > > > > > > metadata services without touching/changing bookkeeper > > > > implementation. > > > > > > > > > > > > > > BP-29 and BP-30 can be thought as two different metadata > service > > > > > > > implementation based > > > > > > > on the metadata api contract defined in BP-28. > > > > > > > > > > > > > > BP-29 is to use Etcd as the metadata service, while BP-30 is to > > > have > > > > a > > > > > > > built-in key/value service as the metadata service. Both BP-29 > > and > > > > > BP-30 > > > > > > > have pros and cons. However they > > > > > > > are not against to each other. Allowing two concurrent > approaches > > > > will > > > > > > help > > > > > > > us understand > > > > > > > more on metadata management in bookkeeper and its ecosystem > (e.g. > > > > dlog, > > > > > > > pulsar), which > > > > > > > will lead the project head in a healthy direction. > > > > > > > > > > > > > > **Proposed Changes** > > > > > > > > > > > > > > This proposal is to propose this table service as a contrib > > module > > > > > under > > > > > > > `stream` directory just as how we handle `dlog`. We can mark it > > as > > > > > > > "preview"/"alpha" in 4.7 and continue the development of this > > > module > > > > in > > > > > > > bookkeeper community. > > > > > > > > > > > > > > The details of the proposal can be found in the google doc > > attached > > > > > > below: > > > > > > > > > > > > > > > > > > > > > > > https://docs.google.com/document/d/155xAwWv5IdOitHh1NVMEwCMGgB28M > > > > > > 3FyMiQSxEpjE-Y/edit#heading=h.56rbh52koe3f > > > > > > > > > > > > > > Please take a look. Comments are welcome. > > > > > > > > > > > > > > - Sijie > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > > > > > > > > -- Enrico Olivelli > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > Jvrao > > > > > --- > > > > > First they ignore you, then they laugh at you, then they fight you, > > > then > > > > > you win. - Mahatma Gandhi > > > > > > > > > > > > > > > > > > -- > > > > > > > > > -- Enrico Olivelli > > > > > > > > -- > > > -- Enrico Olivelli >