Yeah, I have an actual customer who ran into this. Unfortunately,
inconsistencies in the way things are named are pretty common - just
look at Kafka's many CLI options.

I don't think that supporting both and pointing at the docs with "I
told you so" when our metrics break is a good solution.

On Fri, Jul 10, 2015 at 4:33 PM, Ewen Cheslack-Postava
<e...@confluent.io> wrote:
> I figure you'll probably see complaints no matter what change you make.
> Gwen, given that you raised this, another important question might be how
> many people you see using *both*. I'm guessing this question came up
> because you actually saw a conflict? But I'd imagine (or at least hope)
> that most organizations are mostly consistent about naming topics -- they
> standardize on one or the other.
>
> Since there's no "right" way to name them, I'd just leave it supporting
> both and document the potential conflict in metrics. And if people use both
> naming schemes, they probably deserve to suffer for their inconsistency :)
>
> -Ewen
>
> On Fri, Jul 10, 2015 at 3:28 PM, Gwen Shapira <gshap...@cloudera.com> wrote:
>
>> I find dots more common in my customer base, so I will definitely feel
>> the pain of removing them.
>>
>> However, "." are already used in metrics, file names, directories, etc
>> - so if we keep the dots, we need to keep code that translates them
>> and document the translation. Just banning "." seems more natural.
>> Also, as Grant mentioned, we'll probably have our own special usage
>> for "." down the line.
>>
>> On Fri, Jul 10, 2015 at 2:12 PM, Todd Palino <tpal...@gmail.com> wrote:
>> > I absolutely disagree with #2, Neha. That will break a lot of
>> > infrastructure within LinkedIn. That said, removing "." might break other
>> > people as well, but I think we should have a clearer idea of how much
>> usage
>> > there is on either side.
>> >
>> > -Todd
>> >
>> >
>> > On Fri, Jul 10, 2015 at 2:08 PM, Neha Narkhede <n...@confluent.io>
>> wrote:
>> >
>> >> "." seems natural for grouping topic names. +1 for 2) going forward only
>> >> without breaking previously created topics with "_" though that might
>> >> require us to patch the code somewhat awkwardly till we phase it out a
>> >> couple (purposely left vague to stay out of Ewen's wrath :-)) versions
>> >> later.
>> >>
>> >> On Fri, Jul 10, 2015 at 2:02 PM, Gwen Shapira <gshap...@cloudera.com>
>> >> wrote:
>> >>
>> >> > I don't think we should break existing topics. Just disallow new
>> >> > topics going forward.
>> >> >
>> >> > Agree that having both is horrible, but we should have a solution that
>> >> > fails when you run "kafka_topics.sh --create", not when you configure
>> >> > Ganglia.
>> >> >
>> >> > Gwen
>> >> >
>> >> > On Fri, Jul 10, 2015 at 1:53 PM, Jay Kreps <j...@confluent.io> wrote:
>> >> > > Unfortunately '.' is pretty common too. I agree that it is perverse,
>> >> but
>> >> > > people seem to do it. Breaking all the topics with '.' in the name
>> >> seems
>> >> > > like it could be worse than combining metrics for people who have a
>> >> > > 'foo_bar' AND 'foo.bar' (and after all, having both is DEEPLY
>> perverse,
>> >> > > no?).
>> >> > >
>> >> > > Where is our Dean of Compatibility, Ewen, on this?
>> >> > >
>> >> > > -Jay
>> >> > >
>> >> > > On Fri, Jul 10, 2015 at 1:32 PM, Todd Palino <tpal...@gmail.com>
>> >> wrote:
>> >> > >
>> >> > >> My selfish point of view is that we do #1, as we use "_"
>> extensively
>> >> in
>> >> > >> topic names here :) I also happen to think it's the right choice,
>> >> > >> specifically because "." has more special meanings, as you noted.
>> >> > >>
>> >> > >> -Todd
>> >> > >>
>> >> > >>
>> >> > >> On Fri, Jul 10, 2015 at 1:30 PM, Gwen Shapira <
>> gshap...@cloudera.com>
>> >> > >> wrote:
>> >> > >>
>> >> > >> > Unintentional side effect from allowing IP addresses in consumer
>> >> > client
>> >> > >> > IDs :)
>> >> > >> >
>> >> > >> > So the question is, what do we do now?
>> >> > >> >
>> >> > >> > 1) disallow "."
>> >> > >> > 2) disallow "_"
>> >> > >> > 3) find a reversible way to encode "." and "_" that won't break
>> >> > existing
>> >> > >> > metrics
>> >> > >> > 4) all of the above?
>> >> > >> >
>> >> > >> > btw. it looks like "." and ".." are currently valid. Topic names
>> are
>> >> > >> > used for directories, right? this sounds like fun :)
>> >> > >> >
>> >> > >> > I vote for option #1, although if someone has a good idea for #3
>> it
>> >> > >> > will be even better.
>> >> > >> >
>> >> > >> > Gwen
>> >> > >> >
>> >> > >> >
>> >> > >> >
>> >> > >> > On Fri, Jul 10, 2015 at 1:22 PM, Grant Henke <
>> ghe...@cloudera.com>
>> >> > >> wrote:
>> >> > >> > > Found it was added here:
>> >> > >> https://issues.apache.org/jira/browse/KAFKA-697
>> >> > >> > >
>> >> > >> > > On Fri, Jul 10, 2015 at 3:18 PM, Todd Palino <
>> tpal...@gmail.com>
>> >> > >> wrote:
>> >> > >> > >
>> >> > >> > >> This was definitely changed at some point after KAFKA-495. The
>> >> > >> question
>> >> > >> > is
>> >> > >> > >> when and why.
>> >> > >> > >>
>> >> > >> > >> Here's the relevant code from that patch:
>> >> > >> > >>
>> >> > >> > >>
>> >> ===================================================================
>> >> > >> > >> --- core/src/main/scala/kafka/utils/Topic.scala (revision
>> >> 1390178)
>> >> > >> > >> +++ core/src/main/scala/kafka/utils/Topic.scala (working copy)
>> >> > >> > >> @@ -21,24 +21,21 @@
>> >> > >> > >>  import util.matching.Regex
>> >> > >> > >>
>> >> > >> > >>  object Topic {
>> >> > >> > >> +  val legalChars = "[a-zA-Z0-9_-]"
>> >> > >> > >>
>> >> > >> > >>
>> >> > >> > >>
>> >> > >> > >> -Todd
>> >> > >> > >>
>> >> > >> > >>
>> >> > >> > >> On Fri, Jul 10, 2015 at 1:02 PM, Grant Henke <
>> >> ghe...@cloudera.com>
>> >> > >> > wrote:
>> >> > >> > >>
>> >> > >> > >> > kafka.common.Topic shows that currently period is a valid
>> >> > character
>> >> > >> > and I
>> >> > >> > >> > have verified I can use kafka-topics.sh to create a new
>> topic
>> >> > with a
>> >> > >> > >> > period.
>> >> > >> > >> >
>> >> > >> > >> >
>> >> > >> > >> > AdminUtils.createOrUpdateTopicPartitionAssignmentPathInZK
>> >> > currently
>> >> > >> > uses
>> >> > >> > >> > Topic.validate before writing to Zookeeper.
>> >> > >> > >> >
>> >> > >> > >> > Should period character support be removed? I was under the
>> >> same
>> >> > >> > >> impression
>> >> > >> > >> > as Gwen, that a period was used by many as a way to "group"
>> >> > topics.
>> >> > >> > >> >
>> >> > >> > >> > The code is pasted below since its small:
>> >> > >> > >> >
>> >> > >> > >> > object Topic {
>> >> > >> > >> >   val legalChars = "[a-zA-Z0-9\\._\\-]"
>> >> > >> > >> >   private val maxNameLength = 255
>> >> > >> > >> >   private val rgx = new Regex(legalChars + "+")
>> >> > >> > >> >
>> >> > >> > >> >   val InternalTopics = Set(OffsetManager.OffsetsTopicName)
>> >> > >> > >> >
>> >> > >> > >> >   def validate(topic: String) {
>> >> > >> > >> >     if (topic.length <= 0)
>> >> > >> > >> >       throw new InvalidTopicException("topic name is
>> illegal,
>> >> > can't
>> >> > >> be
>> >> > >> > >> > empty")
>> >> > >> > >> >     else if (topic.equals(".") || topic.equals(".."))
>> >> > >> > >> >       throw new InvalidTopicException("topic name cannot be
>> >> > \".\" or
>> >> > >> > >> > \"..\"")
>> >> > >> > >> >     else if (topic.length > maxNameLength)
>> >> > >> > >> >       throw new InvalidTopicException("topic name is
>> illegal,
>> >> > can't
>> >> > >> be
>> >> > >> > >> > longer than " + maxNameLength + " characters")
>> >> > >> > >> >
>> >> > >> > >> >     rgx.findFirstIn(topic) match {
>> >> > >> > >> >       case Some(t) =>
>> >> > >> > >> >         if (!t.equals(topic))
>> >> > >> > >> >           throw new InvalidTopicException("topic name " +
>> topic
>> >> > + "
>> >> > >> is
>> >> > >> > >> > illegal, contains a character other than ASCII
>> alphanumerics,
>> >> > '.',
>> >> > >> '_'
>> >> > >> > >> and
>> >> > >> > >> > '-'")
>> >> > >> > >> >       case None => throw new InvalidTopicException("topic
>> name
>> >> "
>> >> > +
>> >> > >> > topic
>> >> > >> > >> +
>> >> > >> > >> > " is illegal,  contains a character other than ASCII
>> >> > alphanumerics,
>> >> > >> > '.',
>> >> > >> > >> > '_' and '-'")
>> >> > >> > >> >     }
>> >> > >> > >> >   }
>> >> > >> > >> > }
>> >> > >> > >> >
>> >> > >> > >> > On Fri, Jul 10, 2015 at 2:50 PM, Todd Palino <
>> >> tpal...@gmail.com>
>> >> > >> > wrote:
>> >> > >> > >> >
>> >> > >> > >> > > I had to go look this one up again to make sure -
>> >> > >> > >> > > https://issues.apache.org/jira/browse/KAFKA-495
>> >> > >> > >> > >
>> >> > >> > >> > > The only valid character names for topics are
>> alphanumeric,
>> >> > >> > underscore,
>> >> > >> > >> > and
>> >> > >> > >> > > dash. A period is not supposed to be a valid character to
>> >> use.
>> >> > If
>> >> > >> > >> you're
>> >> > >> > >> > > seeing them, then one of two things have happened:
>> >> > >> > >> > >
>> >> > >> > >> > > 1) You have topic names that are grandfathered in from
>> before
>> >> > that
>> >> > >> > >> patch
>> >> > >> > >> > > 2) The patch is not working properly and there is
>> somewhere
>> >> in
>> >> > the
>> >> > >> > >> broker
>> >> > >> > >> > > that the standard is not being enforced.
>> >> > >> > >> > >
>> >> > >> > >> > > -Todd
>> >> > >> > >> > >
>> >> > >> > >> > >
>> >> > >> > >> > > On Fri, Jul 10, 2015 at 12:13 PM, Brock Noland <
>> >> > br...@apache.org>
>> >> > >> > >> wrote:
>> >> > >> > >> > >
>> >> > >> > >> > > > On Fri, Jul 10, 2015 at 11:34 AM, Gwen Shapira <
>> >> > >> > >> gshap...@cloudera.com>
>> >> > >> > >> > > > wrote:
>> >> > >> > >> > > > > Hi Kafka Fans,
>> >> > >> > >> > > > >
>> >> > >> > >> > > > > If you have one topic named "kafka_lab_2" and the
>> other
>> >> > named
>> >> > >> > >> > > > > "kafka.lab.2", the topic level metrics will be named
>> >> > >> kafka_lab_2
>> >> > >> > >> for
>> >> > >> > >> > > > > both, effectively making it impossible to monitor them
>> >> > >> properly.
>> >> > >> > >> > > > >
>> >> > >> > >> > > > > The reason this happens is that using "." in topic
>> names
>> >> is
>> >> > >> > pretty
>> >> > >> > >> > > > > common, especially as a way to group topics into data
>> >> > centers,
>> >> > >> > >> > > > > relevant apps, etc - basically a work-around to our
>> >> current
>> >> > >> > lack of
>> >> > >> > >> > > > > name spaces. However, most metric monitoring systems
>> >> using
>> >> > "."
>> >> > >> > to
>> >> > >> > >> > > > > annotate hierarchy, so to avoid issues around metric
>> >> names,
>> >> > >> > Kafka
>> >> > >> > >> > > > > replaces the "." in the name with an underscore.
>> >> > >> > >> > > > >
>> >> > >> > >> > > > > This generates good metric names, but creates the
>> problem
>> >> > with
>> >> > >> > name
>> >> > >> > >> > > > collisions.
>> >> > >> > >> > > > >
>> >> > >> > >> > > > > I'm wondering if it makes sense to simply limit the
>> range
>> >> > of
>> >> > >> > >> > > > > characters permitted in a topic name and disallow "_"?
>> >> > >> Obviously
>> >> > >> > >> > > > > existing topics will need to remain as is, which is a
>> bit
>> >> > >> > awkward.
>> >> > >> > >> > > >
>> >> > >> > >> > > > Interesting problem! Many if not most users I
>> personally am
>> >> > >> aware
>> >> > >> > of
>> >> > >> > >> > > > use "_" as a separator in topic names. I am sure that
>> many
>> >> > users
>> >> > >> > >> would
>> >> > >> > >> > > > be quite surprised by this limitation. With that said,
>> I am
>> >> > sure
>> >> > >> > >> > > > they'd transition accordingly.
>> >> > >> > >> > > >
>> >> > >> > >> > > > >
>> >> > >> > >> > > > > If anyone has better backward-compatible solutions to
>> >> this,
>> >> > >> I'm
>> >> > >> > all
>> >> > >> > >> > > ears
>> >> > >> > >> > > > :)
>> >> > >> > >> > > > >
>> >> > >> > >> > > > > Gwen
>> >> > >> > >> > > >
>> >> > >> > >> > >
>> >> > >> > >> >
>> >> > >> > >> >
>> >> > >> > >> >
>> >> > >> > >> > --
>> >> > >> > >> > Grant Henke
>> >> > >> > >> > Solutions Consultant | Cloudera
>> >> > >> > >> > ghe...@cloudera.com | twitter.com/gchenke |
>> >> > >> > linkedin.com/in/granthenke
>> >> > >> > >> >
>> >> > >> > >>
>> >> > >> > >
>> >> > >> > >
>> >> > >> > >
>> >> > >> > > --
>> >> > >> > > Grant Henke
>> >> > >> > > Solutions Consultant | Cloudera
>> >> > >> > > ghe...@cloudera.com | twitter.com/gchenke |
>> >> > linkedin.com/in/granthenke
>> >> > >> >
>> >> > >>
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> Thanks,
>> >> Neha
>> >>
>>
>
>
>
> --
> Thanks,
> Ewen

Reply via email to