Re: [DISCUSS] KIP-78: Cluster Id

Dong Lin Sat, 03 Sep 2016 18:54:07 -0700

Hey Sumit,

I have no doubt that there are benefits with using tags. But the usage of
tags is actually orthogonal to the usage of cluster.id. I am not sure the
benefits of using tags that you provided can help us decide whether
randomly generated cluster.id is better than readable cluster.id from
config.


In addition, it is hard to say evaluate your suggested approach until we
know the goal and implementation detail of this approach. There are some
interested questions regarding your approach. Let me list some of them:

- Using readable cluster.id doesn't rule out using tags. Would it be better
to use readable cluster.id + readable tags than random cluster.id +
readable tags?
-  Do you even need cluster-id to distinguish between clusters if you have
tags?
- Are you going to include both the random cluster-id and tags in the
sensor name?

I am happy to discuss this approach in more detail if you can provide the
goal and motivation in either this KIP or a new KIP.

Thanks,
Dong


On Sat, Sep 3, 2016 at 5:32 PM, sumit arrawatia <[email protected]>
wrote:

> Hi Dong,
>
> Please find my comments inline.
>
> Hopefully they address your concerns.
>
> Have a great weekend !
> Sumit
>
> On Sat, Sep 3, 2016 at 3:17 PM, Dong Lin <[email protected]> wrote:
>
> > Hi Sumit,
> >
> > Please see my comments inline.
> >
> > On Sat, Sep 3, 2016 at 10:33 AM, sumit arrawatia <
> > [email protected]>
> > wrote:
> >
> > > Hi Dong,
> > >
> > > Please see my comments inline.
> > >
> > > Sumit
> > >
> > > On Sat, Sep 3, 2016 at 9:26 AM, Dong Lin <[email protected]> wrote:
> > >
> > > > Hey Sumit,
> > > >
> > > > Thanks Sumit for your explanation. It seems that the concern is that
> we
> > > can
> > > > not easily change cluster.id if we read this from config. Maybe the
> > KIP
> > > > should also specify the requirement for users to change the
> cluster.id
> > .
> > > >
> > > > But it seems to me that it is equally straightforward to change
> > > cluster.id
> > > > in both approaches. Do you think the following approach would work:
> > > >
> > > > *How does cluster.id <http://cluster.id> from config work:*
> > > >
> > > > When Kafka starts, it reads cluster.id from config. And then it
> reads
> > > > cluster.id from zookeeper.
> > > > - if the cluster.id is not specified in zookeeper, create the znode.
> > > > - if the cluster.id is specified in zookeeper
> > > >    - if the cluster.id in znode is the same as the that in config,
> > > proceed
> > > >    - Otherwise, broker startup fails and it reports error. Note that
> we
> > > > don't make this change after the startup.
> > > >
> > > > This is how the current approach of generating ids would work so I
> > agree
> > > with you that is how setting the cluster id from config would work too.
> > :)
> > >
> > >
> > > > *How can we change cluster.id <http://cluster.id>:*
> > > >
> > > > - Update kafka broker config to use the new cluster.id
> > > > - Either delete that znode from zookeeper, or update kafka broker
> > config
> > > to
> > > > use a new zookeeper which doesn't have that znode
> > > > - Do a rolling bounce
> > > >
> > > > With the current approach described in the KIP, if you want to change
> > the
> > > > cluster.id, you need to either delete the znode, or change the znode
> > > > content, before doing a rolling bounce. I don't think the approach
> > above
> > > is
> > > > more difficult than this. Any idea?
> > > >
> > >
> > > I agree that this approach will work but only if we don't store
> > cluster.id
> > > on in meta.properties on the disk. But I think you will like the
> proposed
> > > approach better if I provide some more context.
> > >
> > >
> > I am not sure why you mention "... only if we don't store cluster.id in
> > meta.properties", since neither the KIP nor my suggestion asks to store
> > cluster.id in the meta.properties. Sorry if the confusion comes from one
> > of
> > my earlier where I said we can read cluster.id from meta.properties. In
> my
> > proposal it should read from the same broker config file where we config
> > zookeeper url.
> >
> > Anyway, since my proposed approach doesn't require the broker to store
> > cluster.id in meta.properties, can I say we agree that user can easily
> > change cluster.id with this approach?
> >
> >
> >
> > > I understand that your concern is that cluster ids should be human
> > readable
> > > and it is therefore better to let the user set it and modify it. I
> agree
> > > that we should have human readable names as it makes it easier for
> users
> > to
> > > identify the cluster in their monitoring dashboard. But we also want to
> > > allow users to change this name easily.
> > >
> >
> > Yes, we all agree on this.
> >
> >
> > >
> > > At the same time, we want the cluster identity to be unique and fairly
> > > constant for uses cases like auditing. If we allow the users to set it,
> > we
> > > place the burden on the users to maintain uniqueness.
> > >
> >
> > I disagree that it is a burden for user to configure the unique
> cluster.id
> > for each cluster. There is something that users need to configure
> > correctly, such as zookeeper url. I don't think it is overwhelm for user
> to
> > configure cluster.id correct. What we can do is to let Kafka report
> error
> > if it is not configured correctly, which is covered in the approach I
> > suggested.
> >
> >
> Lets agree to disagree here :). It might be easy to coordinate uniqueness
> if you have a centralized data infrastructure and single team which takes
> care of all changes. This is typically the case for either very small
> organizations or very mature large organizations. But even these
> organizations can have a lot of clusters because of different requirements
> (like varying SLAs , separating PII from non-PII data, separate data paths
> for client facing vs non client facing applications, etc).
>
> Coordinating uniqueness across the organization when you have multiple
> clusters is a significant burden, especially when there is no way to
> enforce it across multiple clusters (unlike zookeeper url which will cause
> Kafka to fail / behave badly if it is not configured properly).
>
> >
> > > So, the approach we propose is to generate a immutable UUID for the
> > cluster
> > > and and allow the users to assign a human readable name to the cluster
> > > using resource tags. Tags also allow you to add structured metadata in
> > > addition to the human readable name. For eg. if you want to identify
> your
> > > cluster based on region, datacenter, env, etc. ,you can add these as
> > tags.
> > > Otherwise you will need to encode this in your cluster.id property.
> > >
> > > The current KIP lays the foundation for this approach. We plan to
> > implement
> > > the first part (UUID) in a manner that makes it easy to add the second
> > part
> > > (human readable names using tags) in a future KIP.
> > >
> >
> > Are you suggesting that it is better to use randomly generated
> cluster.id
> > +
> > human-readable tag than using human-readable cluster.id? If so, can you
> > explain the reason?
> >
>
> Yes. The reason is the same as mentioned in above paragraph. Some of the
> reasons from the top of my head.
>
> 1. It is much more flexible, you can easily add tags when your deployment
> metadata or naming scheme changes.
>
> For eg. if you want to identify your cluster based on region, datacenter,
> env, etc. ,you can add these as tags (Name=log-aggregation-kafka, dc=az-1,
> env=production).
>
> If you need to do the same for cluster.id you will have to come up with a
> naming scheme like <dc>-<env>-<production>
>
> Now consider that you want to add another dimension like region but only
> for production for client facing clusters. It is easy to add just that tag
> to metadata to just these clusters.
>
> Now if you have to do it using cluster id , you will have to update your
> naming scheme and update all your clusters with new ids so that uniqueness
> across clusters is maintained.
>
> Also, consider the code (for monitoring, alerting, auditing, log
> aggregation) that will parse this cluster id : every time you need to
> change the naming scheme for cluster id you will need to change the code
> everywhere . But if you use tags, you will only need to update the parts of
> code which need that tag.
>
> 2. Updating these tags would be really easy and would not need any downtime
> as they are just metadata associated with a unique cluster id. This
> metadata would be stored in the /cluster znode along with the generated
> cluster id and can be updated easily using tools and require no reboots or
> rolling upgrades. (Please note that the KIP for resource tags is not
> finalized so the details might change but the essential point remains the
> same)
>
> 3. Use cases which depend on cluster ids being stable like security
> auditing or data auditing will not be impacted if the tags themselves
> change. Otherwise you will need to maintain a mapping of old cluster.id to
> new cluster.id or change the audit logs/storage to reflect the cluster.id
> change.
>
> Even monitoring and log aggregation use cases benefits from having a stable
> cluster id. If you change the cluster id, then either you will need to
> throw away all the old historical data or update it with the new cluster id
> .
>
> 4. Tags will allow you to create UI/scripts to manage multiple clusters
> easily. You can query clusters by tags and target commands. This means
> generic tools can be created easily by the community for these use cases.
>
>
> >
> > >
> > > Hopefully this helps to clarify your doubts and concerns.
> > >
> > >
> > > > Cheers,
> > > > Dong
> > > >
> > > >
> > > >
> > > > On Sat, Sep 3, 2016 at 12:46 AM, sumit arrawatia <
> > > > [email protected]>
> > > > wrote:
> > > >
> > > > > Hi Dong,
> > > > >
> > > > > If you set the cluster.id in the config, the problem is how you
> > > > > change/update the cluster.id .
> > > > >
> > > > >
> > > > > You will need to change the all the config files and make sure
> every
> > > one
> > > > of
> > > > > them is correct as well as update the ZK metadata. This will
> require
> > a
> > > > > reboot/downtime of the entire cluster, whereas generating ids
> (along
> > > with
> > > > > the yet-to-be-published resource tags KIP ) makes it easy for the
> > > admins
> > > > to
> > > > > update the human readable name without reboots/ clusterwide changes
> > to
> > > > > config.
> > > > >
> > > > >
> > > > > Also, when we implement the change to write cluster.id in
> > > > meta.properties,
> > > > > the process of updating the cluster.id becomes even more
> > complicated.
> > > > >
> > > > >
> > > > > 1. We will have to ensure that the entire process is transactional.
> > It
> > > > will
> > > > > be painful from an operational standpoint as it would require as
> much
> > > > > operational downtime and care as a version upgrade because it will
> > > modify
> > > > > the on-disk data. The recovery/rollback scenario would be very
> > > difficult
> > > > > and would probably need manual changes to meta.properties.
> > > > >
> > > > >
> > > > > 2. Given that we would want to generate an error if the cluster.id
> > in
> > > > > meta.properties doesn't match cluster.id in ZK, we will have to
> > setup
> > > > > complicated logic in ZK to ensure we forgo the check when changing
> > the
> > > > > cluster.id. I am not even sure how to do it properly for a rolling
> > > > upgrade
> > > > > without downtime.
> > > > >
> > > > >
> > > > > All these points are based on my experience running Elasticsearch
> in
> > > > > production. ElasticSearch specifies cluster.name statically in the
> > > > > properties as well as includes it in the data directory name and
> > > changing
> > > > > it is a nightmare.
> > > > >
> > > > >
> > > > > You would think that naming changes should be rare but in my
> > experience
> > > > > they are not. Sometimes typos creep in, sometimes you have to
> change
> > > the
> > > > > name to consolidate clusters or divide them and sometimes the infra
> > > team
> > > > > decides to change the deployment metadata.
> > > > >
> > > > >
> > > > > This is why I think AWS approach of (assigning immutable ids to
> > > > resources +
> > > > > human readable names in tags) works very well operationally.
> > > > >
> > > > >
> > > > > Hope this helps !
> > > > >
> > > > >
> > > > > Cheers,
> > > > >
> > > > > Sumit
> > > > >
> > > > > On Fri, Sep 2, 2016 at 11:44 PM, Dong Lin <[email protected]>
> > wrote:
> > > > >
> > > > > > Hey Ismael,
> > > > > >
> > > > > > Thanks for your reply. Please see my comment inline.
> > > > > >
> > > > > > On Fri, Sep 2, 2016 at 8:28 PM, Ismael Juma <[email protected]>
> > > wrote:
> > > > > >
> > > > > > > Hi Dong,
> > > > > > >
> > > > > > > Thanks for your feedback. Comments inline.
> > > > > > >
> > > > > > > On Thu, Sep 1, 2016 at 7:51 PM, Dong Lin <[email protected]>
> > > > wrote:
> > > > > > > >
> > > > > > > > I share the view with Harsha and would like to understand how
> > the
> > > > > > current
> > > > > > > > approach of randomly generating cluster.id compares with the
> > > > > approach
> > > > > > of
> > > > > > > > manually specifying it in meta.properties.
> > > > > > > >
> > > > > > >
> > > > > > > Harsha's suggestion in the thread was to store the generated id
> > in
> > > > > > > meta.properties, not to manually specify it via
> meta.properties.
> > > > > > >
> > > > > > > >
> > > > > > > > I think one big advantage of defining it manually in
> zookeeper
> > is
> > > > > that
> > > > > > we
> > > > > > > > can easily tell which cluster it is by simply looking at the
> > > sensor
> > > > > > name,
> > > > > > > > which makes it more useful to the auditing or monitoring
> > use-case
> > > > > that
> > > > > > > this
> > > > > > > > KIP intends to address.
> > > > > > >
> > > > > > >
> > > > > > > If you really want to customise the name, it is possible with
> the
> > > > > current
> > > > > > > proposal: save the appropriate znode in ZooKeeper before a
> broker
> > > > > > > auto-generates it. We don't encourage that because once you
> have
> > a
> > > > > > > meaningful name, there's a good chance that you may want to
> > change
> > > it
> > > > > in
> > > > > > > the future. And things break down at that point. That's why we
> > > prefer
> > > > > > > having a generated, unique and immutable id complemented by a
> > > > > changeable
> > > > > > > human readable name. As described in the KIP, we think the
> latter
> > > can
> > > > > be
> > > > > > > achieved more generally via resource tags (which will be a
> > separate
> > > > > KIP).
> > > > > > >
> > > > > > > Can you elaborate what will break down if we need to change the
> > > name?
> > > > > >
> > > > > > Even if we can not change name because something will breakdown
> in
> > > that
> > > > > > case, it seems that it is still better to read id from config
> than
> > > > using
> > > > > a
> > > > > > randomly generated ID. In my suggested solution user can simply
> > > choose
> > > > > not
> > > > > > to change the name and make sure there is unique id per cluster.
> In
> > > > your
> > > > > > proposal you need to store the old cluster.id and manually
> restore
> > > it
> > > > in
> > > > > > zookeeper in some scenarios. What do you think?
> > > > > >
> > > > > >
> > > > > > > > On the other hand, if you can only tell whether two
> > > > > > > > sensors are measuring the same cluster or not. Also note that
> > > even
> > > > > this
> > > > > > > > goal is not easily guaranteed, because you need an external
> > > > mechanism
> > > > > > to
> > > > > > > > manually re-generate znode with the old cluster.id if znode
> is
> > > > > deleted
> > > > > > > or
> > > > > > > > if the same cluster (w.r.t purpose) is changed to use a
> > different
> > > > > > > > zookeeper.
> > > > > > > >
> > > > > > >
> > > > > > > If we assume that znodes can be deleted at random, the cluster
> id
> > > is
> > > > > > > probably the least of one's worries. And yes, when moving to a
> > > > > > > different ZooKeeper while wanting to retain the cluster id, you
> > > would
> > > > > > have
> > > > > > > to set the znode manually. This doesn't seem too onerous
> compared
> > > to
> > > > > the
> > > > > > > other work you will have to do for this scenario.
> > > > > > >
> > > > > > > Maybe this work is not much compared to other work. But we can
> > > agree
> > > > > that
> > > > > > no work is better than little work, right? I am interested to see
> > if
> > > we
> > > > > can
> > > > > > avoid the work and still meet the motivation and goals of this
> KIP.
> > > > > >
> > > > > >
> > > > > > > > I read your reply to Harsha but still I don't fully
> understand
> > > your
> > > > > > > concern
> > > > > > > > with that approach. I think the broker can simply register
> > > > group.id
> > > > > in
> > > > > > > > that
> > > > > > > > znode if it is not specified yet, in the same way that this
> KIP
> > > > > > proposes
> > > > > > > to
> > > > > > > > do it, right? Can you please elaborate more about your
> concern
> > > with
> > > > > > this
> > > > > > > > approach?
> > > > > > > >
> > > > > > >
> > > > > > > It's a bit difficult to answer this comment because it seems
> like
> > > the
> > > > > > > intent of your suggestion is different than Harsha's.
> > > > > > >
> > > > > > > I am not necessarily opposed to storing the cluster id in
> > > > > meta.properties
> > > > > > > (note that we have one meta.properties per log.dir), but I
> think
> > > > there
> > > > > > are
> > > > > > > a number of things that need to be discussed and I don't think
> we
> > > > need
> > > > > to
> > > > > > > block KIP-78 while that takes place. Delivering features
> > > > incrementally
> > > > > > is a
> > > > > > > good thing in my opinion (KIP-31/32, KIP-33 and KIP-79 is a
> good
> > > > recent
> > > > > > > example).
> > > > > > >
> > > > > >
> > > > > > If I understand it right, the motivation of this KIP is to allow
> > > > cluster
> > > > > to
> > > > > > be uniquely identified. This is a useful feature and I am not
> > asking
> > > > for
> > > > > > anything beyond this scope. It is just that reading cluster.id
> > from
> > > > > config
> > > > > > seems to be a better solution in order to meet the motivation and
> > all
> > > > the
> > > > > > goals described in the KIP. More specifically, using cluster.id
> > not
> > > > only
> > > > > > allows user to distinguish between different clusters, it also
> lets
> > > > user
> > > > > > identify cluster. In comparison, randomly generated cluster.id
> > > allows
> > > > > user
> > > > > > to distinguish cluster with a little bit more effort, and doesn't
> > > allow
> > > > > > user to identify a cluster by simply reading e.g. sensor name.
> Did
> > I
> > > > miss
> > > > > > something here?
> > > > > >
> > > > > >
> > > > > > >
> > > > > > > Ismael
> > > > > > >
> > > > > > > P.S. For what is worth, the following version of the KIP
> includes
> > > an
> > > > > > > incomplete description (it assumes a single meta.properties,
> but
> > > > there
> > > > > > > could be many) of what the broker would have to do if we wanted
> > to
> > > > save
> > > > > > to
> > > > > > > meta.properties and potentially restore the znode from it. The
> > > state
> > > > > > space
> > > > > > > becomes a lot more complex, increasing potential for bugs (we
> > had a
> > > > few
> > > > > > for
> > > > > > > generated broker ids). In contrast, the current proposal is
> very
> > > > simple
> > > > > > and
> > > > > > > doesn't prevent us from introducing the additional
> functionality
> > > > later.
> > > > > > >
> > > > > > > https://cwiki.apache.org/confluence/pages/viewpage.
> > > > > > action?pageId=65868433
> > > > > > >
> > > > > >
> > > > > > IMO reading cluster.id from config should be as easy as reading
> > > broker
> > > > > id
> > > > > > from config. Storing cluster.id from config in znode requires
> the
> > > same
> > > > > > amount of effort as storing randomly generated cluster.id in
> > znode.
> > > > > Maybe
> > > > > > I
> > > > > > missed something here. Can you point me to the section of the KIP
> > > that
> > > > > > explains why it is more difficult if we want to read cluster.id
> > from
> > > > > > config?
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Regards,
> > > > > Sumit
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Regards,
> > > Sumit
> > >
> >
>
>
>
> --
> Regards,
> Sumit
>

Re: [DISCUSS] KIP-78: Cluster Id

Reply via email to