Re: [DISCUSS] KIP-78: Cluster Id

Ismael Juma Mon, 05 Sep 2016 02:46:05 -0700

Dong,

Sumit responded to a number of points already, so I will try to be brief.
See inline.

Also, it may just be possible that we won't reach agreement. In that case,
a vote may be a way to figure out if people feel that this proposal adds
value in its current form or not.

On Mon, Sep 5, 2016 at 12:54 AM, Dong Lin <lindon...@gmail.com> wrote:

> I don't think have a human-readable name is equivalent to a meaningful
> name. It is not true that a human readable name makes it more likely you
> want to change it. Look, every city has a human readable name and we don't
> worry about changing its name. The conference room in any company has a
> human readable name instead of a random id. For the same reason you can
> name a cluster as Yosemite and don't have to change it in the future.
>

As Sumit said, many cities have in fact changed their names. Incidentally,
all the conference names at Confluent were recently renamed. So, this
illustrates the point well. Yes, it is possible to give human-readable, but
not meaningful names. I still think that unique and immutable
auto-generated id + changeable human-readable name is a better overall
solution.

By immutable I think you are saying that we should prevent people from
> changing cluster.id. However, this KIP doesn't really prevent this from
> happening -- user can delete znode and restart kafka to change cluster.id.
> Therefore the requirement is not satisfied anyway.
>

Sure, we can't prevent users from deleting state in ZooKeeper or elsewhere
if they have access to it. The idea is that users wouldn't need to with the
auto-generated id.

I am also not sure why you want to prevent people from changing cluster.id
> after reading the motivation section of this KIP. Is there any motivation
> or use-case for this requirement?
>

I thought I explained this a few times. :) Sumit took a stab as well. The
requirement is to reliably associate a message with a cluster. Each time
the cluster id changes, you are basically "creating" a new cluster so it
would look like messages are associated with 2 different clusters instead
of a single one. This is an old database topic, of course: surrogate versus
natural keys.

It is not clear why it will make downstream code would be more complex and
> feature less useful if we provide a default cluster.id here. For users who
> are not interested in this feature, they can use the cluster.id and all
> downstream application will not be affected. For users who need this
> feature, they can configure a unique human readable cluster.id for their
> clusters. In this case the downstream application will have the same
> complexity as with the approach in this KIP. Did I miss something?
>

Can you please clarify what you mean by "default cluster.id"? I don't
follow what you're saying in the comment above.

Right, there is no easy way to detect this automatically with Kafka. But
> this is not a requirement to automatically detect violation of uniqueness
> in the first place. SRE can manually make sure that the unique cluster.id
> is given to each cluster in the broker config.

We would like the feature to be useful across the board. Not all teams have
a super capable team of SREs like LinkedIn. Some may not even have SREs at
all. :)

I am not sure if it is weird. We can seek the view from other SRE and
> developer to understand whether and why it is weird. I can ask our SRE to
> comment as well. It is hard to evaluate whether "weirdness" outweighs the
> benefits from the ability to identify cluster with a human readable
> cluster.id without knowing its impact on the use-case and user experience.
>

It seems that a few things are being conflated here. You can set the
cluster id manually in either proposal. The main differences are:

1. Whether the cluster id is auto-generated if not present (the KIP
proposes auto-generation and, if I understand correctly, you are suggesting
that it should not)
2. How the cluster id can be set manually (you'd have to set the relevant
znode value with the KIP proposal whereas you are suggesting that it should
be possible via a broker config)
3. The recommended workflow (the KIP suggests that you should just rely on
the auto-generated id whereas you are suggesting that setting the value
manually is a good idea).

> Hmm.. you and Sumit provided two completely difference requirement
> regarding immutability and easiness of change. I share similar view with
> Sumit on this issue. Of course we prefer to avoid changing the config. But
> the one-time config change is probably not a big deal as compared to the
> long-term benefit that comes with human readable in the monitoring/auditing
> use-case.
>

As Sumit clarified, both of us are actually saying the same thing. I am
quite confused when you say that you share a similar view with Sumit on
this issue. :)

Ismael

Re: [DISCUSS] KIP-78: Cluster Id

Reply via email to