I like the "lets warn people of conflicts when creating the topic"
suggestion. IMO, automatic topic creation as currently done is buggy
either way (Send data and hope the topic is ready before retries run
out, potentially failing with the super helpful NO_LEADER error), so I
don't mind leaving it broken a bit more. I think the right behavior is
that conflicts will cause auto creating to fail, the same way we
currently do when the default number of replicas is higher than number
of brokers.

One thing that is left confusing is that people in the "." camp need
to know about the conversion or they will fail to find their topics in
their monitoring tools. Not very nice to them, but I can't think of
alternatives.

I'll start with the doc patch :)

On Sat, Jul 11, 2015 at 12:54 AM, Ewen Cheslack-Postava
<e...@confluent.io> wrote:
> On Fri, Jul 10, 2015 at 4:41 PM, Gwen Shapira <gshap...@cloudera.com> wrote:
>
>> Yeah, I have an actual customer who ran into this. Unfortunately,
>> inconsistencies in the way things are named are pretty common - just
>> look at Kafka's many CLI options.
>>
>> I don't think that supporting both and pointing at the docs with "I
>> told you so" when our metrics break is a good solution.
>>
>
> I agree, especially since we don't *already* have something in the docs
> indicating this will be an issue. I was flippant about the situation
> because I *wish* there was more careful consideration + naming policy in
> place, but I realize that doesn't always happen in practice. I guess I need
> to take Compatibility Czar more seriously :)
>
> I see think the obvious practical options are as follows:
>
> 1. Kill support for "_". Piss off the entire set of people who currently
> use "_" anywhere in topic names.
> 2. Kill support for ".". Piss off the entire set of people who currently
> use "." anywhere in topic names.
> 3. Tell people they need to be careful about this issue. Piss off the set
> of people who use both "_" and "." *and* happen to have conflicting topic
> names. They will have some pain when they discover the issue and have to
> figure out how to move one of those topics over to a non-conflicting name.
> I'm going to claim that this group must be an *extremely* small fraction of
> users, which doesn't make it better to allow things to break for them, but
> at least gives us an idea of the scale of impact.
>
> (One other alternative suggested earlier was encoding metric names to
> account for differences; given the metric renaming mess in the last
> release, I'm extremely hesitant to suggest anything of the sort...)
>
> None of the options are ideal, but to me, 3 seems like the least painful.
> Both for us, and for the vast majority of users. It seems to me that the
> number of users that would complain about (1) or (2) drastically outweigh
> (3).
>
> At this point, I don't think it's practical to keep switching the rules
> about which characters are allowed and which aren't because the previous
> attempts haven't been successful -- it seems the rules have changed
> multiple times, whether intentionally or accidentally, such that any more
> changes will cause problems. At this point, I think we just need to accept
> being liberal in accepting the range of topic names that have been
> permitted so far and make the best of the situation, even if it means only
> being able to warn people of conflicts.
>
> Here's another alternative: how about being liberal with topic name
> characters, but upon topic creation we convert the name to the metric name
> and fail if there's a conflict with another topic? This is relatively
> expensive (requires getting the metric name of all other topics), but it
> avoids the bad situation we're encountering here (conflicting metrics),
> avoids getting into a persistent conflict (we kill topic creation when we
> detect the issue rather than noticing it when the metrics conflict
> happens), and keeps the vast majority of existing users happy (both _ and .
> work in topic names as long as you don't create topics with conflicting
> metric names).
>
> There are definitely details to be worked out (auto topic creation?), but
> it seems like a more realistic solution than to start disallowing _ or . in
> topic names.
>
> -Ewen
>
>
>>
>> On Fri, Jul 10, 2015 at 4:33 PM, Ewen Cheslack-Postava
>> <e...@confluent.io> wrote:
>> > I figure you'll probably see complaints no matter what change you make.
>> > Gwen, given that you raised this, another important question might be how
>> > many people you see using *both*. I'm guessing this question came up
>> > because you actually saw a conflict? But I'd imagine (or at least hope)
>> > that most organizations are mostly consistent about naming topics -- they
>> > standardize on one or the other.
>> >
>> > Since there's no "right" way to name them, I'd just leave it supporting
>> > both and document the potential conflict in metrics. And if people use
>> both
>> > naming schemes, they probably deserve to suffer for their inconsistency
>> :)
>> >
>> > -Ewen
>> >
>> > On Fri, Jul 10, 2015 at 3:28 PM, Gwen Shapira <gshap...@cloudera.com>
>> wrote:
>> >
>> >> I find dots more common in my customer base, so I will definitely feel
>> >> the pain of removing them.
>> >>
>> >> However, "." are already used in metrics, file names, directories, etc
>> >> - so if we keep the dots, we need to keep code that translates them
>> >> and document the translation. Just banning "." seems more natural.
>> >> Also, as Grant mentioned, we'll probably have our own special usage
>> >> for "." down the line.
>> >>
>> >> On Fri, Jul 10, 2015 at 2:12 PM, Todd Palino <tpal...@gmail.com> wrote:
>> >> > I absolutely disagree with #2, Neha. That will break a lot of
>> >> > infrastructure within LinkedIn. That said, removing "." might break
>> other
>> >> > people as well, but I think we should have a clearer idea of how much
>> >> usage
>> >> > there is on either side.
>> >> >
>> >> > -Todd
>> >> >
>> >> >
>> >> > On Fri, Jul 10, 2015 at 2:08 PM, Neha Narkhede <n...@confluent.io>
>> >> wrote:
>> >> >
>> >> >> "." seems natural for grouping topic names. +1 for 2) going forward
>> only
>> >> >> without breaking previously created topics with "_" though that might
>> >> >> require us to patch the code somewhat awkwardly till we phase it out
>> a
>> >> >> couple (purposely left vague to stay out of Ewen's wrath :-))
>> versions
>> >> >> later.
>> >> >>
>> >> >> On Fri, Jul 10, 2015 at 2:02 PM, Gwen Shapira <gshap...@cloudera.com
>> >
>> >> >> wrote:
>> >> >>
>> >> >> > I don't think we should break existing topics. Just disallow new
>> >> >> > topics going forward.
>> >> >> >
>> >> >> > Agree that having both is horrible, but we should have a solution
>> that
>> >> >> > fails when you run "kafka_topics.sh --create", not when you
>> configure
>> >> >> > Ganglia.
>> >> >> >
>> >> >> > Gwen
>> >> >> >
>> >> >> > On Fri, Jul 10, 2015 at 1:53 PM, Jay Kreps <j...@confluent.io>
>> wrote:
>> >> >> > > Unfortunately '.' is pretty common too. I agree that it is
>> perverse,
>> >> >> but
>> >> >> > > people seem to do it. Breaking all the topics with '.' in the
>> name
>> >> >> seems
>> >> >> > > like it could be worse than combining metrics for people who
>> have a
>> >> >> > > 'foo_bar' AND 'foo.bar' (and after all, having both is DEEPLY
>> >> perverse,
>> >> >> > > no?).
>> >> >> > >
>> >> >> > > Where is our Dean of Compatibility, Ewen, on this?
>> >> >> > >
>> >> >> > > -Jay
>> >> >> > >
>> >> >> > > On Fri, Jul 10, 2015 at 1:32 PM, Todd Palino <tpal...@gmail.com>
>> >> >> wrote:
>> >> >> > >
>> >> >> > >> My selfish point of view is that we do #1, as we use "_"
>> >> extensively
>> >> >> in
>> >> >> > >> topic names here :) I also happen to think it's the right
>> choice,
>> >> >> > >> specifically because "." has more special meanings, as you
>> noted.
>> >> >> > >>
>> >> >> > >> -Todd
>> >> >> > >>
>> >> >> > >>
>> >> >> > >> On Fri, Jul 10, 2015 at 1:30 PM, Gwen Shapira <
>> >> gshap...@cloudera.com>
>> >> >> > >> wrote:
>> >> >> > >>
>> >> >> > >> > Unintentional side effect from allowing IP addresses in
>> consumer
>> >> >> > client
>> >> >> > >> > IDs :)
>> >> >> > >> >
>> >> >> > >> > So the question is, what do we do now?
>> >> >> > >> >
>> >> >> > >> > 1) disallow "."
>> >> >> > >> > 2) disallow "_"
>> >> >> > >> > 3) find a reversible way to encode "." and "_" that won't
>> break
>> >> >> > existing
>> >> >> > >> > metrics
>> >> >> > >> > 4) all of the above?
>> >> >> > >> >
>> >> >> > >> > btw. it looks like "." and ".." are currently valid. Topic
>> names
>> >> are
>> >> >> > >> > used for directories, right? this sounds like fun :)
>> >> >> > >> >
>> >> >> > >> > I vote for option #1, although if someone has a good idea for
>> #3
>> >> it
>> >> >> > >> > will be even better.
>> >> >> > >> >
>> >> >> > >> > Gwen
>> >> >> > >> >
>> >> >> > >> >
>> >> >> > >> >
>> >> >> > >> > On Fri, Jul 10, 2015 at 1:22 PM, Grant Henke <
>> >> ghe...@cloudera.com>
>> >> >> > >> wrote:
>> >> >> > >> > > Found it was added here:
>> >> >> > >> https://issues.apache.org/jira/browse/KAFKA-697
>> >> >> > >> > >
>> >> >> > >> > > On Fri, Jul 10, 2015 at 3:18 PM, Todd Palino <
>> >> tpal...@gmail.com>
>> >> >> > >> wrote:
>> >> >> > >> > >
>> >> >> > >> > >> This was definitely changed at some point after KAFKA-495.
>> The
>> >> >> > >> question
>> >> >> > >> > is
>> >> >> > >> > >> when and why.
>> >> >> > >> > >>
>> >> >> > >> > >> Here's the relevant code from that patch:
>> >> >> > >> > >>
>> >> >> > >> > >>
>> >> >> ===================================================================
>> >> >> > >> > >> --- core/src/main/scala/kafka/utils/Topic.scala (revision
>> >> >> 1390178)
>> >> >> > >> > >> +++ core/src/main/scala/kafka/utils/Topic.scala (working
>> copy)
>> >> >> > >> > >> @@ -21,24 +21,21 @@
>> >> >> > >> > >>  import util.matching.Regex
>> >> >> > >> > >>
>> >> >> > >> > >>  object Topic {
>> >> >> > >> > >> +  val legalChars = "[a-zA-Z0-9_-]"
>> >> >> > >> > >>
>> >> >> > >> > >>
>> >> >> > >> > >>
>> >> >> > >> > >> -Todd
>> >> >> > >> > >>
>> >> >> > >> > >>
>> >> >> > >> > >> On Fri, Jul 10, 2015 at 1:02 PM, Grant Henke <
>> >> >> ghe...@cloudera.com>
>> >> >> > >> > wrote:
>> >> >> > >> > >>
>> >> >> > >> > >> > kafka.common.Topic shows that currently period is a valid
>> >> >> > character
>> >> >> > >> > and I
>> >> >> > >> > >> > have verified I can use kafka-topics.sh to create a new
>> >> topic
>> >> >> > with a
>> >> >> > >> > >> > period.
>> >> >> > >> > >> >
>> >> >> > >> > >> >
>> >> >> > >> > >> > AdminUtils.createOrUpdateTopicPartitionAssignmentPathInZK
>> >> >> > currently
>> >> >> > >> > uses
>> >> >> > >> > >> > Topic.validate before writing to Zookeeper.
>> >> >> > >> > >> >
>> >> >> > >> > >> > Should period character support be removed? I was under
>> the
>> >> >> same
>> >> >> > >> > >> impression
>> >> >> > >> > >> > as Gwen, that a period was used by many as a way to
>> "group"
>> >> >> > topics.
>> >> >> > >> > >> >
>> >> >> > >> > >> > The code is pasted below since its small:
>> >> >> > >> > >> >
>> >> >> > >> > >> > object Topic {
>> >> >> > >> > >> >   val legalChars = "[a-zA-Z0-9\\._\\-]"
>> >> >> > >> > >> >   private val maxNameLength = 255
>> >> >> > >> > >> >   private val rgx = new Regex(legalChars + "+")
>> >> >> > >> > >> >
>> >> >> > >> > >> >   val InternalTopics =
>> Set(OffsetManager.OffsetsTopicName)
>> >> >> > >> > >> >
>> >> >> > >> > >> >   def validate(topic: String) {
>> >> >> > >> > >> >     if (topic.length <= 0)
>> >> >> > >> > >> >       throw new InvalidTopicException("topic name is
>> >> illegal,
>> >> >> > can't
>> >> >> > >> be
>> >> >> > >> > >> > empty")
>> >> >> > >> > >> >     else if (topic.equals(".") || topic.equals(".."))
>> >> >> > >> > >> >       throw new InvalidTopicException("topic name cannot
>> be
>> >> >> > \".\" or
>> >> >> > >> > >> > \"..\"")
>> >> >> > >> > >> >     else if (topic.length > maxNameLength)
>> >> >> > >> > >> >       throw new InvalidTopicException("topic name is
>> >> illegal,
>> >> >> > can't
>> >> >> > >> be
>> >> >> > >> > >> > longer than " + maxNameLength + " characters")
>> >> >> > >> > >> >
>> >> >> > >> > >> >     rgx.findFirstIn(topic) match {
>> >> >> > >> > >> >       case Some(t) =>
>> >> >> > >> > >> >         if (!t.equals(topic))
>> >> >> > >> > >> >           throw new InvalidTopicException("topic name " +
>> >> topic
>> >> >> > + "
>> >> >> > >> is
>> >> >> > >> > >> > illegal, contains a character other than ASCII
>> >> alphanumerics,
>> >> >> > '.',
>> >> >> > >> '_'
>> >> >> > >> > >> and
>> >> >> > >> > >> > '-'")
>> >> >> > >> > >> >       case None => throw new InvalidTopicException("topic
>> >> name
>> >> >> "
>> >> >> > +
>> >> >> > >> > topic
>> >> >> > >> > >> +
>> >> >> > >> > >> > " is illegal,  contains a character other than ASCII
>> >> >> > alphanumerics,
>> >> >> > >> > '.',
>> >> >> > >> > >> > '_' and '-'")
>> >> >> > >> > >> >     }
>> >> >> > >> > >> >   }
>> >> >> > >> > >> > }
>> >> >> > >> > >> >
>> >> >> > >> > >> > On Fri, Jul 10, 2015 at 2:50 PM, Todd Palino <
>> >> >> tpal...@gmail.com>
>> >> >> > >> > wrote:
>> >> >> > >> > >> >
>> >> >> > >> > >> > > I had to go look this one up again to make sure -
>> >> >> > >> > >> > > https://issues.apache.org/jira/browse/KAFKA-495
>> >> >> > >> > >> > >
>> >> >> > >> > >> > > The only valid character names for topics are
>> >> alphanumeric,
>> >> >> > >> > underscore,
>> >> >> > >> > >> > and
>> >> >> > >> > >> > > dash. A period is not supposed to be a valid character
>> to
>> >> >> use.
>> >> >> > If
>> >> >> > >> > >> you're
>> >> >> > >> > >> > > seeing them, then one of two things have happened:
>> >> >> > >> > >> > >
>> >> >> > >> > >> > > 1) You have topic names that are grandfathered in from
>> >> before
>> >> >> > that
>> >> >> > >> > >> patch
>> >> >> > >> > >> > > 2) The patch is not working properly and there is
>> >> somewhere
>> >> >> in
>> >> >> > the
>> >> >> > >> > >> broker
>> >> >> > >> > >> > > that the standard is not being enforced.
>> >> >> > >> > >> > >
>> >> >> > >> > >> > > -Todd
>> >> >> > >> > >> > >
>> >> >> > >> > >> > >
>> >> >> > >> > >> > > On Fri, Jul 10, 2015 at 12:13 PM, Brock Noland <
>> >> >> > br...@apache.org>
>> >> >> > >> > >> wrote:
>> >> >> > >> > >> > >
>> >> >> > >> > >> > > > On Fri, Jul 10, 2015 at 11:34 AM, Gwen Shapira <
>> >> >> > >> > >> gshap...@cloudera.com>
>> >> >> > >> > >> > > > wrote:
>> >> >> > >> > >> > > > > Hi Kafka Fans,
>> >> >> > >> > >> > > > >
>> >> >> > >> > >> > > > > If you have one topic named "kafka_lab_2" and the
>> >> other
>> >> >> > named
>> >> >> > >> > >> > > > > "kafka.lab.2", the topic level metrics will be
>> named
>> >> >> > >> kafka_lab_2
>> >> >> > >> > >> for
>> >> >> > >> > >> > > > > both, effectively making it impossible to monitor
>> them
>> >> >> > >> properly.
>> >> >> > >> > >> > > > >
>> >> >> > >> > >> > > > > The reason this happens is that using "." in topic
>> >> names
>> >> >> is
>> >> >> > >> > pretty
>> >> >> > >> > >> > > > > common, especially as a way to group topics into
>> data
>> >> >> > centers,
>> >> >> > >> > >> > > > > relevant apps, etc - basically a work-around to our
>> >> >> current
>> >> >> > >> > lack of
>> >> >> > >> > >> > > > > name spaces. However, most metric monitoring
>> systems
>> >> >> using
>> >> >> > "."
>> >> >> > >> > to
>> >> >> > >> > >> > > > > annotate hierarchy, so to avoid issues around
>> metric
>> >> >> names,
>> >> >> > >> > Kafka
>> >> >> > >> > >> > > > > replaces the "." in the name with an underscore.
>> >> >> > >> > >> > > > >
>> >> >> > >> > >> > > > > This generates good metric names, but creates the
>> >> problem
>> >> >> > with
>> >> >> > >> > name
>> >> >> > >> > >> > > > collisions.
>> >> >> > >> > >> > > > >
>> >> >> > >> > >> > > > > I'm wondering if it makes sense to simply limit the
>> >> range
>> >> >> > of
>> >> >> > >> > >> > > > > characters permitted in a topic name and disallow
>> "_"?
>> >> >> > >> Obviously
>> >> >> > >> > >> > > > > existing topics will need to remain as is, which
>> is a
>> >> bit
>> >> >> > >> > awkward.
>> >> >> > >> > >> > > >
>> >> >> > >> > >> > > > Interesting problem! Many if not most users I
>> >> personally am
>> >> >> > >> aware
>> >> >> > >> > of
>> >> >> > >> > >> > > > use "_" as a separator in topic names. I am sure that
>> >> many
>> >> >> > users
>> >> >> > >> > >> would
>> >> >> > >> > >> > > > be quite surprised by this limitation. With that
>> said,
>> >> I am
>> >> >> > sure
>> >> >> > >> > >> > > > they'd transition accordingly.
>> >> >> > >> > >> > > >
>> >> >> > >> > >> > > > >
>> >> >> > >> > >> > > > > If anyone has better backward-compatible solutions
>> to
>> >> >> this,
>> >> >> > >> I'm
>> >> >> > >> > all
>> >> >> > >> > >> > > ears
>> >> >> > >> > >> > > > :)
>> >> >> > >> > >> > > > >
>> >> >> > >> > >> > > > > Gwen
>> >> >> > >> > >> > > >
>> >> >> > >> > >> > >
>> >> >> > >> > >> >
>> >> >> > >> > >> >
>> >> >> > >> > >> >
>> >> >> > >> > >> > --
>> >> >> > >> > >> > Grant Henke
>> >> >> > >> > >> > Solutions Consultant | Cloudera
>> >> >> > >> > >> > ghe...@cloudera.com | twitter.com/gchenke |
>> >> >> > >> > linkedin.com/in/granthenke
>> >> >> > >> > >> >
>> >> >> > >> > >>
>> >> >> > >> > >
>> >> >> > >> > >
>> >> >> > >> > >
>> >> >> > >> > > --
>> >> >> > >> > > Grant Henke
>> >> >> > >> > > Solutions Consultant | Cloudera
>> >> >> > >> > > ghe...@cloudera.com | twitter.com/gchenke |
>> >> >> > linkedin.com/in/granthenke
>> >> >> > >> >
>> >> >> > >>
>> >> >> >
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >> Thanks,
>> >> >> Neha
>> >> >>
>> >>
>> >
>> >
>> >
>> > --
>> > Thanks,
>> > Ewen
>>
>
>
>
> --
> Thanks,
> Ewen

Reply via email to