On Wed, Sep 13, 2017 at 12:16 AM, Enrico Olivelli <eolive...@gmail.com>
wrote:

> I think that this is a good direction to go.
>
> I believe to the reasons about ZK in huge systems even it is not my case so
> I cannot add comments on this usecase.
>
> I am fine with direction as long as we are still going to support
> ZooKeeper.
> BookKeeper is in the Hadoop / ZooKeeper ecosystem and several products rely
> on ZK too, for instance in my systems it is usual to have
> BookKeeper/Kafka/HBase/Majordodo....  and so I am not going to live
> without
> zookeeper in the short/mid term.
>
> I am really OK in dropping ZK because for "simple" systems in fact when you
> need only BK having the burden of setting up a zookeeper server is weird
> for customers. I usually re-distribute BK + ZK with my applications and we
> are talking about little clusters of up to 10 machines.
>

Just to clarify - we are not dropping ZK here. we are just proposing to
have a ledger manager implementation that doesn't depend on zookeeper
directly.
We are not modifying any existing ledger manager implementation.


>
> The direction on this proposal is OK for me and it is very like the work I
> was starting about "standalone mode".


> I think it will be very easy to support the case of having a single bookie
> with this approach or even client+ bookie in the same JVM,
> Having multiple bookies will make us to add some other coordination
> facility between bookies, I would like to know if there is already some
> idea about this, are we going to use another product like etcd,jgroups or
> implement our own coordination protocol ?


we are not replacing A with B, even with zookeeper. the ledger management
is already abstracted in interfaces.
the users can use whatever system they prefer as the metadata store.

our direction is to provide an option to store metadata as well as data in
bookies. so in this option, there is no external metadata storage needed.


> ZK is simple but it very
> effective.

Maybe we could help the ZK community to move forward and resolve
> the problems we are bringing to light
>
>
> Enrico
>
>
> 2017-09-13 3:15 GMT+02:00 Jia Zhai <zhaiji...@gmail.com>:
>
> > Any thoughts or comments
> > :)
> >
> > Thanks a lot.
> > -Jia
> >
> > On Tue, Sep 12, 2017 at 4:30 PM, Jia Zhai <zhaiji...@gmail.com> wrote:
> >
> > > This blog: https://bitworks.software/blog/en/2017-07-12-replicated-
> > > scalable-commitlog-with-apachebookkeeper.html, which also refer a
> little
> > > the limitation of zookeeper in bookkeeper
> > >
> > > On Thu, Sep 7, 2017 at 9:45 AM, Jia Zhai <zhaiji...@gmail.com> wrote:
> > >
> > >> 👍. Thanks a lot for the suggestions and feed back.
> > >>
> > >> On Thu, Sep 7, 2017 at 4:24 AM, Sijie Guo <guosi...@gmail.com> wrote:
> > >>
> > >>> On Wed, Sep 6, 2017 at 1:07 PM, Enrico Olivelli <eolive...@gmail.com
> >
> > >>> wrote:
> > >>>
> > >>> > Off topic curiosity... Jia and Sijie, do you think we are going to
> > >>> drop ZK
> > >>> > from DL too?
> > >>> >
> > >>>
> > >>> Yes. That's the goal - 1) for large deployment, we are trying to
> > overcome
> > >>> the limitation of zookeeper; 2) for smaller deployments, it will make
> > >>> deployment much easier, you just need to deploy a cluster of bookies.
> > >>> once
> > >>> it is done, you can use ledger api or log stream api to access the
> > >>> bookkeeper cluster.
> > >>>
> > >>> Both DL and BK are metadata storage pluggable. They have very clear
> > >>> interfaces on defining metadata operations. So it is straightforward
> to
> > >>> use
> > >>> a different metadata storage.
> > >>>
> > >>>
> > >>> > Enrico
> > >>> >
> > >>> > On mer 6 set 2017, 19:51 Enrico Olivelli <eolive...@gmail.com>
> > wrote:
> > >>> >
> > >>> > >
> > >>> > >
> > >>> > > On mer 6 set 2017, 18:25 Sijie Guo <guosi...@gmail.com> wrote:
> > >>> > >
> > >>> > >> On Sep 6, 2017 4:57 AM, "Enrico Olivelli" <eolive...@gmail.com>
> > >>> wrote:
> > >>> > >>
> > >>> > >> Thank you Sijie and Jia for your comments and explanations,
> > >>> > >> answers inline
> > >>> > >>
> > >>> > >> 2017-09-06 2:23 GMT+02:00 Jia Zhai <zhaiji...@gmail.com>:
> > >>> > >>
> > >>> > >> > Thanks a lot Enrico and Sijie for your comments and
> information
> > on
> > >>> > this.
> > >>> > >> >
> > >>> > >> > On Tue, Sep 5, 2017 at 9:31 PM, Enrico Olivelli <
> > >>> eolive...@gmail.com>
> > >>> > >> > wrote:
> > >>> > >> >
> > >>> > >> > > Great to see you working on this !
> > >>> > >> > > I would be great to have such feature, as it is the first
> step
> > >>> to a
> > >>> > >> > > 'standalone' BookKeeper mode
> > >>> > >> > >
> > >>> > >> > > Some complementary ideas/first look questions:
> > >>> > >> > > - the document does not talk about security, IMHO we have at
> > >>> least
> > >>> > to
> > >>> > >> > cover
> > >>> > >> > > authentication and TLS, it would be great to leverage
> existing
> > >>> > >> > AuthPlugins,
> > >>> > >> > > as they are based on exchanging byte[] (as SASL wants)
> > >>> > >> > >
> > >>> > >> > [Jia] It is a good idea. We left the security part for now
> for a
> > >>> few
> > >>> > >> > reasons. 1) Make this BP more focus on removing zookeeper
> > >>> dependencies
> > >>> > >> from
> > >>> > >> > client. 2) It is introduced as a separated implementation of
> > >>> existing
> > >>> > >> > interfaces. So it won’t impact existing security story.   And
> > for
> > >>> > sure,
> > >>> > >> We
> > >>> > >> > will add the security part later after this.
> > >>> > >> >
> > >>> > >>
> > >>> > >>
> > >>> > >> I am fine, I am only afraid that we won't be able to support it
> in
> > >>> the
> > >>> > >> (near) future,
> > >>> > >> maybe you could just only cite the security story and add some
> > >>> reference
> > >>> > >> to
> > >>> > >> how we would deal with it in future
> > >>> > >>
> > >>> > >>
> > >>> > >> The new ledger manager will be first marked as experimental,
> until
> > >>> it is
> > >>> > >> stable and have security feature.
> > >>> > >>
> > >>> > >> How does that sound?
> > >>> > >>
> > >>> > >
> > >>> > > Ok
> > >>> > >
> > >>> > >>
> > >>> > >>
> > >>> > >>
> > >>> > >> >
> > >>> > >> > - do we have some kind of "bootstrap servers list"
> configuration
> > >>> > option
> > >>> > >> ?
> > >>> > >> > > the list should be complete or just a subset of bookies ? at
> > >>> > >> connection
> > >>> > >> > the
> > >>> > >> > > client could discover the list of other bookies
> > >>> > >> > >
> > >>> > >> > [Jia] Yes, we will have a `clientBootstrapBookies` settings in
> > the
> > >>> > >> server
> > >>> > >> > set. It can be a list of bookies or just simple a DNS over the
> > >>> > bookies.
> > >>> > >> > Will add this to the BP
> > >>> > >> >
> > >>> > >> > - will the client connect to only one bookie at a time ? how
> we
> > >>> will
> > >>> > >> deal
> > >>> > >> > > with errors ?
> > >>> > >> > >
> > >>> > >> > [Jia] It will connect the the list of bootstrap servers. gPRC
> > will
> > >>> > load
> > >>> > >> > balance the requests and manage the connection errors.
> > >>> > >> >
> > >>> > >> > - should the bookie write on ZK metadata its gRPC endpoint
> info
> > ?
> > >>> > (this
> > >>> > >> > > will be useful for a bookie to tell about other bookies to
> the
> > >>> > >> connected
> > >>> > >> > > clients)
> > >>> > >> > >
> > >>> > >> > [Jia]No, it won’t. We don’t see a strong reason to add it.
> > >>> Especially
> > >>> > >> > eventually we may eliminate zookeeper completely.
> > >>> > >> > It can be a fixed port `3281`, or in a scheduler-based
> > >>> environment, it
> > >>> > >> is
> > >>> > >> > very easy to have a load balancer sitting in front of those
> > >>> bookies.
> > >>> > >> >
> > >>> > >>
> > >>> > >> I think a fixed port is not a good way.
> > >>> > >> You will not be able to run more than one bookie on a single
> host.
> > >>> > >>
> > >>> > >> We should support:
> > >>> > >> - configurable port
> > >>> > >> - ephemeral port for tests
> > >>> > >>
> > >>> > >>
> > >>> > >> I think what Jia means is a configurable port, but it is a
> > >>> relatively
> > >>> > >> fixed
> > >>> > >> port, which client doesn't discover this port from zookeeper.
> > >>> > >>
> > >>> > >
> > >>> > > Very good
> > >>> > >
> > >>> > >>
> > >>> > >>
> > >>> > >> Ideally I would like to have the local transport option, in
> order
> > to
> > >>> > have
> > >>> > >> a
> > >>> > >> single JVM, but this is not a blocker problem, as we are running
> > >>> gRPC on
> > >>> > >> netty it should be feasible or we can create some kind of
> > >>> short-circut
> > >>> > >> between the client and the Bookie
> > >>> > >>
> > >>> > >>
> > >>> > >> GRPC supports inprocess channel. So you don't need to use the
> low
> > >>> level
> > >>> > >> netty settings.
> > >>> > >>
> > >>> > >
> > >>> > > Great
> > >>> > >
> > >>> > > So it sounds all good to me thanks
> > >>> > >
> > >>> > > Enrico
> > >>> > >
> > >>> > >
> > >>> > >>
> > >>> > >> I am OK for not writing this to the bookie metadata, leaving up
> to
> > >>> the
> > >>> > >> client have a configured list of bookies enabled to metadata
> > >>> operations
> > >>> > >>
> > >>> > >>
> > >>> > >>
> > >>> > >>
> > >>> > >> >
> > >>> > >> > - the bookie will be somehow a proxy for zookeeper, I think
> that
> > >>> the
> > >>> > >> > > 'watch' part is the more complex, we will have to deal with
> > >>> > >> > reconnections,
> > >>> > >> > > errors....maybe it is worth to write more detail about this
> > >>> > >> > >
> > >>> > >> > [Jia] The `watch` API is using the `streaming` rpc in gRPC. It
> > is
> > >>> a
> > >>> > >> > straightforward proxy behavior, if a connection is broken, the
> > >>> client
> > >>> > >> will
> > >>> > >> > simply retry on watching again.
> > >>> > >> >
> > >>> > >> >
> > >>> > >> > > Minor issues:
> > >>> > >> > > - Maybe you can consider using ledgerId and not ledger_id,
> > like
> > >>> in
> > >>> > >> > > LedgerMetadataFormat we are using lastEntryId
> > >>> > >> > >
> > >>> > >> > [Jia] Thanks, It is a protobuf style. The protobuf will
> convert
> > >>> > >> `ledger_id`
> > >>> > >> > to `ledgerId`. We don’t need to worry about this.
> > >>> > >> >
> > >>> > >>
> > >>> > >> got it, thanks
> > >>> > >>
> > >>> > >>
> > >>> > >> >
> > >>> > >> >
> > >>> > >> > > -In the "motivation" part you write that the fact the having
> > >>> more
> > >>> > >> clients
> > >>> > >> > > than the number of bookies would be a problem for zookeeper,
> > >>> > actually
> > >>> > >> > > zookeeper is very good at dealing with a huge number of
> > clients.
> > >>> > >> > Actually I
> > >>> > >> > > am always running clusters with 3-5 bookies and 10-100
> writing
> > >>> > clients
> > >>> > >> > and
> > >>> > >> > > this has never given troubles
> > >>> > >> >
> > >>> > >> > [Jia] :) Seems “10-100 writing clients” is not “a huge number
> of
> > >>> > >> clients”.
> > >>> > >> >
> > >>> > >>
> > >>> > >> OK, I agree with you an Sijie, I have no experience of larger
> > >>> clusters
> > >>> > >>
> > >>> > >>
> > >>> > >> >
> > >>> > >> > >
> > >>> > >> >
> > >>> > >> >
> > >>> > >> >
> > >>> > >> > > Future:
> > >>> > >> > > - as bookies will be proxies maybe we should take care not
> to
> > >>> > >> overwhelm
> > >>> > >> a
> > >>> > >> > > bookie with too many clients
> > >>> > >> > >
> > >>> > >> > [Jia] First, gRPC is based on Netty, the protocol is http2, so
> > the
> > >>> > >> > connection is multiplexed. We don’t need to worry about
> > connection
> > >>> > >> count.
> > >>> > >> > Second, all the bookies are treated equally for the metadata
> > >>> > operations,
> > >>> > >> > gRPC will load balancing the requests across the bookies. We
> > don’t
> > >>> > need
> > >>> > >> to
> > >>> > >> > worry about some bookies are overwhelmed.
> > >>> > >> >
> > >>> > >>
> > >>> > >> gRPC sounds great
> > >>> > >>
> > >>> > >>
> > >>> > >> >
> > >>> > >> >
> > >>> > >> > > - iteration on ledgers, sometimes the clients enumerates
> > >>> ledgers but
> > >>> > >> it
> > >>> > >> > is
> > >>> > >> > > not interested in having all of them, as we are using the
> > >>> bookie as
> > >>> > >> proxy
> > >>> > >> > > maybe some kind of "filter" (at least on custom metadata)
> > would
> > >>> be
> > >>> > >> create
> > >>> > >> > > to limit the number of returned items. Other point I don't
> > know
> > >>> gRPC
> > >>> > >> but
> > >>> > >> > it
> > >>> > >> > > does not seems to be very clear how to 'stop' the iteration
> > >>> > >> > >
> > >>> > >> > [Jia] Thanks, We can add it later. For now, we would like to
> > >>> focus on
> > >>> > >> > adding the features the ledger manager needs.
> > >>> > >> >
> > >>> > >>
> > >>> > >> Yup
> > >>> > >>
> > >>> > >> -- Enrico
> > >>> > >>
> > >>> > >>
> > >>> > >> >
> > >>> > >> > >
> > >>> > >> > > -- Enrico
> > >>> > >> > >
> > >>> > >> > >
> > >>> > >> > > 2017-09-05 15:10 GMT+02:00 Jia Zhai <zhaiji...@gmail.com>:
> > >>> > >> > >
> > >>> > >> > > > Hi all,
> > >>> > >> > > >
> > >>> > >> > > > I have just posted a proposal to remove zookeeper
> dependency
> > >>> from
> > >>> > >> > > > bookkeeper client, to make bookkeeper client a thin
> client:
> > >>> > >> > > >
> > >>> > >> > > > https://cwiki.apache.org/confluence/display/BOOKKEEPER/
> > >>> > >> > > > BP-16%3A+remove+zookeeper+dependency+from+bookkeeper+
> client
> > >>> > >> > > >
> > >>> > >> > > >
> > >>> > >> > > > BookKeeper uses zookeeper for service discovery
> (discovering
> > >>> the
> > >>> > >> > > available
> > >>> > >> > > > bookies in the cluster), metadata management (storing all
> > the
> > >>> > >> metadata
> > >>> > >> > > for
> > >>> > >> > > > ledgers). However it exposes the metadata storage directly
> > to
> > >>> the
> > >>> > >> > > clients,
> > >>> > >> > > > making bookkeeper client a very thick client. It also
> > exposes
> > >>> some
> > >>> > >> > > > problems.
> > >>> > >> > > >
> > >>> > >> > > > This BP explores the possibility of eliminating zookeeper
> > >>> > completely
> > >>> > >> > from
> > >>> > >> > > > client side, to produce a thin bookkeeper client.
> > >>> > >> > > >
> > >>> > >> > > > I will send a patch as soon as we agree on the proposal.
> > >>> > >> > > >
> > >>> > >> > > >
> > >>> > >> > > > Thanks.
> > >>> > >> > > >
> > >>> > >> > > > -Jia
> > >>> > >> > > >
> > >>> > >> > >
> > >>> > >> >
> > >>> > >>
> > >>> > > --
> > >>> > >
> > >>> > >
> > >>> > > -- Enrico Olivelli
> > >>> > >
> > >>> > --
> > >>> >
> > >>> >
> > >>> > -- Enrico Olivelli
> > >>> >
> > >>>
> > >>
> > >>
> > >
> >
>

Reply via email to