I don't think we should break existing topics. Just disallow new
topics going forward.

Agree that having both is horrible, but we should have a solution that
fails when you run "kafka_topics.sh --create", not when you configure
Ganglia.

Gwen

On Fri, Jul 10, 2015 at 1:53 PM, Jay Kreps <j...@confluent.io> wrote:
> Unfortunately '.' is pretty common too. I agree that it is perverse, but
> people seem to do it. Breaking all the topics with '.' in the name seems
> like it could be worse than combining metrics for people who have a
> 'foo_bar' AND 'foo.bar' (and after all, having both is DEEPLY perverse,
> no?).
>
> Where is our Dean of Compatibility, Ewen, on this?
>
> -Jay
>
> On Fri, Jul 10, 2015 at 1:32 PM, Todd Palino <tpal...@gmail.com> wrote:
>
>> My selfish point of view is that we do #1, as we use "_" extensively in
>> topic names here :) I also happen to think it's the right choice,
>> specifically because "." has more special meanings, as you noted.
>>
>> -Todd
>>
>>
>> On Fri, Jul 10, 2015 at 1:30 PM, Gwen Shapira <gshap...@cloudera.com>
>> wrote:
>>
>> > Unintentional side effect from allowing IP addresses in consumer client
>> > IDs :)
>> >
>> > So the question is, what do we do now?
>> >
>> > 1) disallow "."
>> > 2) disallow "_"
>> > 3) find a reversible way to encode "." and "_" that won't break existing
>> > metrics
>> > 4) all of the above?
>> >
>> > btw. it looks like "." and ".." are currently valid. Topic names are
>> > used for directories, right? this sounds like fun :)
>> >
>> > I vote for option #1, although if someone has a good idea for #3 it
>> > will be even better.
>> >
>> > Gwen
>> >
>> >
>> >
>> > On Fri, Jul 10, 2015 at 1:22 PM, Grant Henke <ghe...@cloudera.com>
>> wrote:
>> > > Found it was added here:
>> https://issues.apache.org/jira/browse/KAFKA-697
>> > >
>> > > On Fri, Jul 10, 2015 at 3:18 PM, Todd Palino <tpal...@gmail.com>
>> wrote:
>> > >
>> > >> This was definitely changed at some point after KAFKA-495. The
>> question
>> > is
>> > >> when and why.
>> > >>
>> > >> Here's the relevant code from that patch:
>> > >>
>> > >> ===================================================================
>> > >> --- core/src/main/scala/kafka/utils/Topic.scala (revision 1390178)
>> > >> +++ core/src/main/scala/kafka/utils/Topic.scala (working copy)
>> > >> @@ -21,24 +21,21 @@
>> > >>  import util.matching.Regex
>> > >>
>> > >>  object Topic {
>> > >> +  val legalChars = "[a-zA-Z0-9_-]"
>> > >>
>> > >>
>> > >>
>> > >> -Todd
>> > >>
>> > >>
>> > >> On Fri, Jul 10, 2015 at 1:02 PM, Grant Henke <ghe...@cloudera.com>
>> > wrote:
>> > >>
>> > >> > kafka.common.Topic shows that currently period is a valid character
>> > and I
>> > >> > have verified I can use kafka-topics.sh to create a new topic with a
>> > >> > period.
>> > >> >
>> > >> >
>> > >> > AdminUtils.createOrUpdateTopicPartitionAssignmentPathInZK currently
>> > uses
>> > >> > Topic.validate before writing to Zookeeper.
>> > >> >
>> > >> > Should period character support be removed? I was under the same
>> > >> impression
>> > >> > as Gwen, that a period was used by many as a way to "group" topics.
>> > >> >
>> > >> > The code is pasted below since its small:
>> > >> >
>> > >> > object Topic {
>> > >> >   val legalChars = "[a-zA-Z0-9\\._\\-]"
>> > >> >   private val maxNameLength = 255
>> > >> >   private val rgx = new Regex(legalChars + "+")
>> > >> >
>> > >> >   val InternalTopics = Set(OffsetManager.OffsetsTopicName)
>> > >> >
>> > >> >   def validate(topic: String) {
>> > >> >     if (topic.length <= 0)
>> > >> >       throw new InvalidTopicException("topic name is illegal, can't
>> be
>> > >> > empty")
>> > >> >     else if (topic.equals(".") || topic.equals(".."))
>> > >> >       throw new InvalidTopicException("topic name cannot be \".\" or
>> > >> > \"..\"")
>> > >> >     else if (topic.length > maxNameLength)
>> > >> >       throw new InvalidTopicException("topic name is illegal, can't
>> be
>> > >> > longer than " + maxNameLength + " characters")
>> > >> >
>> > >> >     rgx.findFirstIn(topic) match {
>> > >> >       case Some(t) =>
>> > >> >         if (!t.equals(topic))
>> > >> >           throw new InvalidTopicException("topic name " + topic + "
>> is
>> > >> > illegal, contains a character other than ASCII alphanumerics, '.',
>> '_'
>> > >> and
>> > >> > '-'")
>> > >> >       case None => throw new InvalidTopicException("topic name " +
>> > topic
>> > >> +
>> > >> > " is illegal,  contains a character other than ASCII alphanumerics,
>> > '.',
>> > >> > '_' and '-'")
>> > >> >     }
>> > >> >   }
>> > >> > }
>> > >> >
>> > >> > On Fri, Jul 10, 2015 at 2:50 PM, Todd Palino <tpal...@gmail.com>
>> > wrote:
>> > >> >
>> > >> > > I had to go look this one up again to make sure -
>> > >> > > https://issues.apache.org/jira/browse/KAFKA-495
>> > >> > >
>> > >> > > The only valid character names for topics are alphanumeric,
>> > underscore,
>> > >> > and
>> > >> > > dash. A period is not supposed to be a valid character to use. If
>> > >> you're
>> > >> > > seeing them, then one of two things have happened:
>> > >> > >
>> > >> > > 1) You have topic names that are grandfathered in from before that
>> > >> patch
>> > >> > > 2) The patch is not working properly and there is somewhere in the
>> > >> broker
>> > >> > > that the standard is not being enforced.
>> > >> > >
>> > >> > > -Todd
>> > >> > >
>> > >> > >
>> > >> > > On Fri, Jul 10, 2015 at 12:13 PM, Brock Noland <br...@apache.org>
>> > >> wrote:
>> > >> > >
>> > >> > > > On Fri, Jul 10, 2015 at 11:34 AM, Gwen Shapira <
>> > >> gshap...@cloudera.com>
>> > >> > > > wrote:
>> > >> > > > > Hi Kafka Fans,
>> > >> > > > >
>> > >> > > > > If you have one topic named "kafka_lab_2" and the other named
>> > >> > > > > "kafka.lab.2", the topic level metrics will be named
>> kafka_lab_2
>> > >> for
>> > >> > > > > both, effectively making it impossible to monitor them
>> properly.
>> > >> > > > >
>> > >> > > > > The reason this happens is that using "." in topic names is
>> > pretty
>> > >> > > > > common, especially as a way to group topics into data centers,
>> > >> > > > > relevant apps, etc - basically a work-around to our current
>> > lack of
>> > >> > > > > name spaces. However, most metric monitoring systems using "."
>> > to
>> > >> > > > > annotate hierarchy, so to avoid issues around metric names,
>> > Kafka
>> > >> > > > > replaces the "." in the name with an underscore.
>> > >> > > > >
>> > >> > > > > This generates good metric names, but creates the problem with
>> > name
>> > >> > > > collisions.
>> > >> > > > >
>> > >> > > > > I'm wondering if it makes sense to simply limit the range of
>> > >> > > > > characters permitted in a topic name and disallow "_"?
>> Obviously
>> > >> > > > > existing topics will need to remain as is, which is a bit
>> > awkward.
>> > >> > > >
>> > >> > > > Interesting problem! Many if not most users I personally am
>> aware
>> > of
>> > >> > > > use "_" as a separator in topic names. I am sure that many users
>> > >> would
>> > >> > > > be quite surprised by this limitation. With that said, I am sure
>> > >> > > > they'd transition accordingly.
>> > >> > > >
>> > >> > > > >
>> > >> > > > > If anyone has better backward-compatible solutions to this,
>> I'm
>> > all
>> > >> > > ears
>> > >> > > > :)
>> > >> > > > >
>> > >> > > > > Gwen
>> > >> > > >
>> > >> > >
>> > >> >
>> > >> >
>> > >> >
>> > >> > --
>> > >> > Grant Henke
>> > >> > Solutions Consultant | Cloudera
>> > >> > ghe...@cloudera.com | twitter.com/gchenke |
>> > linkedin.com/in/granthenke
>> > >> >
>> > >>
>> > >
>> > >
>> > >
>> > > --
>> > > Grant Henke
>> > > Solutions Consultant | Cloudera
>> > > ghe...@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke
>> >
>>

Reply via email to