Off topic curiosity... Jia and Sijie, do you think we are going to drop ZK
from DL too?
Enrico

On mer 6 set 2017, 19:51 Enrico Olivelli <eolive...@gmail.com> wrote:

>
>
> On mer 6 set 2017, 18:25 Sijie Guo <guosi...@gmail.com> wrote:
>
>> On Sep 6, 2017 4:57 AM, "Enrico Olivelli" <eolive...@gmail.com> wrote:
>>
>> Thank you Sijie and Jia for your comments and explanations,
>> answers inline
>>
>> 2017-09-06 2:23 GMT+02:00 Jia Zhai <zhaiji...@gmail.com>:
>>
>> > Thanks a lot Enrico and Sijie for your comments and information on this.
>> >
>> > On Tue, Sep 5, 2017 at 9:31 PM, Enrico Olivelli <eolive...@gmail.com>
>> > wrote:
>> >
>> > > Great to see you working on this !
>> > > I would be great to have such feature, as it is the first step to a
>> > > 'standalone' BookKeeper mode
>> > >
>> > > Some complementary ideas/first look questions:
>> > > - the document does not talk about security, IMHO we have at least to
>> > cover
>> > > authentication and TLS, it would be great to leverage existing
>> > AuthPlugins,
>> > > as they are based on exchanging byte[] (as SASL wants)
>> > >
>> > [Jia] It is a good idea. We left the security part for now for a few
>> > reasons. 1) Make this BP more focus on removing zookeeper dependencies
>> from
>> > client. 2) It is introduced as a separated implementation of existing
>> > interfaces. So it won’t impact existing security story.   And for sure,
>> We
>> > will add the security part later after this.
>> >
>>
>>
>> I am fine, I am only afraid that we won't be able to support it in the
>> (near) future,
>> maybe you could just only cite the security story and add some reference
>> to
>> how we would deal with it in future
>>
>>
>> The new ledger manager will be first marked as experimental, until it is
>> stable and have security feature.
>>
>> How does that sound?
>>
>
> Ok
>
>>
>>
>>
>> >
>> > - do we have some kind of "bootstrap servers list" configuration option
>> ?
>> > > the list should be complete or just a subset of bookies ? at
>> connection
>> > the
>> > > client could discover the list of other bookies
>> > >
>> > [Jia] Yes, we will have a `clientBootstrapBookies` settings in the
>> server
>> > set. It can be a list of bookies or just simple a DNS over the bookies.
>> > Will add this to the BP
>> >
>> > - will the client connect to only one bookie at a time ? how we will
>> deal
>> > > with errors ?
>> > >
>> > [Jia] It will connect the the list of bootstrap servers. gPRC will load
>> > balance the requests and manage the connection errors.
>> >
>> > - should the bookie write on ZK metadata its gRPC endpoint info ? (this
>> > > will be useful for a bookie to tell about other bookies to the
>> connected
>> > > clients)
>> > >
>> > [Jia]No, it won’t. We don’t see a strong reason to add it. Especially
>> > eventually we may eliminate zookeeper completely.
>> > It can be a fixed port `3281`, or in a scheduler-based environment, it
>> is
>> > very easy to have a load balancer sitting in front of those bookies.
>> >
>>
>> I think a fixed port is not a good way.
>> You will not be able to run more than one bookie on a single host.
>>
>> We should support:
>> - configurable port
>> - ephemeral port for tests
>>
>>
>> I think what Jia means is a configurable port, but it is a relatively
>> fixed
>> port, which client doesn't discover this port from zookeeper.
>>
>
> Very good
>
>>
>>
>> Ideally I would like to have the local transport option, in order to have
>> a
>> single JVM, but this is not a blocker problem, as we are running gRPC on
>> netty it should be feasible or we can create some kind of short-circut
>> between the client and the Bookie
>>
>>
>> GRPC supports inprocess channel. So you don't need to use the low level
>> netty settings.
>>
>
> Great
>
> So it sounds all good to me thanks
>
> Enrico
>
>
>>
>> I am OK for not writing this to the bookie metadata, leaving up to the
>> client have a configured list of bookies enabled to metadata operations
>>
>>
>>
>>
>> >
>> > - the bookie will be somehow a proxy for zookeeper, I think that the
>> > > 'watch' part is the more complex, we will have to deal with
>> > reconnections,
>> > > errors....maybe it is worth to write more detail about this
>> > >
>> > [Jia] The `watch` API is using the `streaming` rpc in gRPC. It is a
>> > straightforward proxy behavior, if a connection is broken, the client
>> will
>> > simply retry on watching again.
>> >
>> >
>> > > Minor issues:
>> > > - Maybe you can consider using ledgerId and not ledger_id, like in
>> > > LedgerMetadataFormat we are using lastEntryId
>> > >
>> > [Jia] Thanks, It is a protobuf style. The protobuf will convert
>> `ledger_id`
>> > to `ledgerId`. We don’t need to worry about this.
>> >
>>
>> got it, thanks
>>
>>
>> >
>> >
>> > > -In the "motivation" part you write that the fact the having more
>> clients
>> > > than the number of bookies would be a problem for zookeeper, actually
>> > > zookeeper is very good at dealing with a huge number of clients.
>> > Actually I
>> > > am always running clusters with 3-5 bookies and 10-100 writing clients
>> > and
>> > > this has never given troubles
>> >
>> > [Jia] :) Seems “10-100 writing clients” is not “a huge number of
>> clients”.
>> >
>>
>> OK, I agree with you an Sijie, I have no experience of larger clusters
>>
>>
>> >
>> > >
>> >
>> >
>> >
>> > > Future:
>> > > - as bookies will be proxies maybe we should take care not to
>> overwhelm
>> a
>> > > bookie with too many clients
>> > >
>> > [Jia] First, gRPC is based on Netty, the protocol is http2, so the
>> > connection is multiplexed. We don’t need to worry about connection
>> count.
>> > Second, all the bookies are treated equally for the metadata operations,
>> > gRPC will load balancing the requests across the bookies. We don’t need
>> to
>> > worry about some bookies are overwhelmed.
>> >
>>
>> gRPC sounds great
>>
>>
>> >
>> >
>> > > - iteration on ledgers, sometimes the clients enumerates ledgers but
>> it
>> > is
>> > > not interested in having all of them, as we are using the bookie as
>> proxy
>> > > maybe some kind of "filter" (at least on custom metadata) would be
>> create
>> > > to limit the number of returned items. Other point I don't know gRPC
>> but
>> > it
>> > > does not seems to be very clear how to 'stop' the iteration
>> > >
>> > [Jia] Thanks, We can add it later. For now, we would like to focus on
>> > adding the features the ledger manager needs.
>> >
>>
>> Yup
>>
>> -- Enrico
>>
>>
>> >
>> > >
>> > > -- Enrico
>> > >
>> > >
>> > > 2017-09-05 15:10 GMT+02:00 Jia Zhai <zhaiji...@gmail.com>:
>> > >
>> > > > Hi all,
>> > > >
>> > > > I have just posted a proposal to remove zookeeper dependency from
>> > > > bookkeeper client, to make bookkeeper client a thin client:
>> > > >
>> > > > https://cwiki.apache.org/confluence/display/BOOKKEEPER/
>> > > > BP-16%3A+remove+zookeeper+dependency+from+bookkeeper+client
>> > > >
>> > > >
>> > > > BookKeeper uses zookeeper for service discovery (discovering the
>> > > available
>> > > > bookies in the cluster), metadata management (storing all the
>> metadata
>> > > for
>> > > > ledgers). However it exposes the metadata storage directly to the
>> > > clients,
>> > > > making bookkeeper client a very thick client. It also exposes some
>> > > > problems.
>> > > >
>> > > > This BP explores the possibility of eliminating zookeeper completely
>> > from
>> > > > client side, to produce a thin bookkeeper client.
>> > > >
>> > > > I will send a patch as soon as we agree on the proposal.
>> > > >
>> > > >
>> > > > Thanks.
>> > > >
>> > > > -Jia
>> > > >
>> > >
>> >
>>
> --
>
>
> -- Enrico Olivelli
>
-- 


-- Enrico Olivelli

Reply via email to