My selfish point of view is that we do #1, as we use "_" extensively in topic names here :) I also happen to think it's the right choice, specifically because "." has more special meanings, as you noted.
-Todd On Fri, Jul 10, 2015 at 1:30 PM, Gwen Shapira <gshap...@cloudera.com> wrote: > Unintentional side effect from allowing IP addresses in consumer client > IDs :) > > So the question is, what do we do now? > > 1) disallow "." > 2) disallow "_" > 3) find a reversible way to encode "." and "_" that won't break existing > metrics > 4) all of the above? > > btw. it looks like "." and ".." are currently valid. Topic names are > used for directories, right? this sounds like fun :) > > I vote for option #1, although if someone has a good idea for #3 it > will be even better. > > Gwen > > > > On Fri, Jul 10, 2015 at 1:22 PM, Grant Henke <ghe...@cloudera.com> wrote: > > Found it was added here: https://issues.apache.org/jira/browse/KAFKA-697 > > > > On Fri, Jul 10, 2015 at 3:18 PM, Todd Palino <tpal...@gmail.com> wrote: > > > >> This was definitely changed at some point after KAFKA-495. The question > is > >> when and why. > >> > >> Here's the relevant code from that patch: > >> > >> =================================================================== > >> --- core/src/main/scala/kafka/utils/Topic.scala (revision 1390178) > >> +++ core/src/main/scala/kafka/utils/Topic.scala (working copy) > >> @@ -21,24 +21,21 @@ > >> import util.matching.Regex > >> > >> object Topic { > >> + val legalChars = "[a-zA-Z0-9_-]" > >> > >> > >> > >> -Todd > >> > >> > >> On Fri, Jul 10, 2015 at 1:02 PM, Grant Henke <ghe...@cloudera.com> > wrote: > >> > >> > kafka.common.Topic shows that currently period is a valid character > and I > >> > have verified I can use kafka-topics.sh to create a new topic with a > >> > period. > >> > > >> > > >> > AdminUtils.createOrUpdateTopicPartitionAssignmentPathInZK currently > uses > >> > Topic.validate before writing to Zookeeper. > >> > > >> > Should period character support be removed? I was under the same > >> impression > >> > as Gwen, that a period was used by many as a way to "group" topics. > >> > > >> > The code is pasted below since its small: > >> > > >> > object Topic { > >> > val legalChars = "[a-zA-Z0-9\\._\\-]" > >> > private val maxNameLength = 255 > >> > private val rgx = new Regex(legalChars + "+") > >> > > >> > val InternalTopics = Set(OffsetManager.OffsetsTopicName) > >> > > >> > def validate(topic: String) { > >> > if (topic.length <= 0) > >> > throw new InvalidTopicException("topic name is illegal, can't be > >> > empty") > >> > else if (topic.equals(".") || topic.equals("..")) > >> > throw new InvalidTopicException("topic name cannot be \".\" or > >> > \"..\"") > >> > else if (topic.length > maxNameLength) > >> > throw new InvalidTopicException("topic name is illegal, can't be > >> > longer than " + maxNameLength + " characters") > >> > > >> > rgx.findFirstIn(topic) match { > >> > case Some(t) => > >> > if (!t.equals(topic)) > >> > throw new InvalidTopicException("topic name " + topic + " is > >> > illegal, contains a character other than ASCII alphanumerics, '.', '_' > >> and > >> > '-'") > >> > case None => throw new InvalidTopicException("topic name " + > topic > >> + > >> > " is illegal, contains a character other than ASCII alphanumerics, > '.', > >> > '_' and '-'") > >> > } > >> > } > >> > } > >> > > >> > On Fri, Jul 10, 2015 at 2:50 PM, Todd Palino <tpal...@gmail.com> > wrote: > >> > > >> > > I had to go look this one up again to make sure - > >> > > https://issues.apache.org/jira/browse/KAFKA-495 > >> > > > >> > > The only valid character names for topics are alphanumeric, > underscore, > >> > and > >> > > dash. A period is not supposed to be a valid character to use. If > >> you're > >> > > seeing them, then one of two things have happened: > >> > > > >> > > 1) You have topic names that are grandfathered in from before that > >> patch > >> > > 2) The patch is not working properly and there is somewhere in the > >> broker > >> > > that the standard is not being enforced. > >> > > > >> > > -Todd > >> > > > >> > > > >> > > On Fri, Jul 10, 2015 at 12:13 PM, Brock Noland <br...@apache.org> > >> wrote: > >> > > > >> > > > On Fri, Jul 10, 2015 at 11:34 AM, Gwen Shapira < > >> gshap...@cloudera.com> > >> > > > wrote: > >> > > > > Hi Kafka Fans, > >> > > > > > >> > > > > If you have one topic named "kafka_lab_2" and the other named > >> > > > > "kafka.lab.2", the topic level metrics will be named kafka_lab_2 > >> for > >> > > > > both, effectively making it impossible to monitor them properly. > >> > > > > > >> > > > > The reason this happens is that using "." in topic names is > pretty > >> > > > > common, especially as a way to group topics into data centers, > >> > > > > relevant apps, etc - basically a work-around to our current > lack of > >> > > > > name spaces. However, most metric monitoring systems using "." > to > >> > > > > annotate hierarchy, so to avoid issues around metric names, > Kafka > >> > > > > replaces the "." in the name with an underscore. > >> > > > > > >> > > > > This generates good metric names, but creates the problem with > name > >> > > > collisions. > >> > > > > > >> > > > > I'm wondering if it makes sense to simply limit the range of > >> > > > > characters permitted in a topic name and disallow "_"? Obviously > >> > > > > existing topics will need to remain as is, which is a bit > awkward. > >> > > > > >> > > > Interesting problem! Many if not most users I personally am aware > of > >> > > > use "_" as a separator in topic names. I am sure that many users > >> would > >> > > > be quite surprised by this limitation. With that said, I am sure > >> > > > they'd transition accordingly. > >> > > > > >> > > > > > >> > > > > If anyone has better backward-compatible solutions to this, I'm > all > >> > > ears > >> > > > :) > >> > > > > > >> > > > > Gwen > >> > > > > >> > > > >> > > >> > > >> > > >> > -- > >> > Grant Henke > >> > Solutions Consultant | Cloudera > >> > ghe...@cloudera.com | twitter.com/gchenke | > linkedin.com/in/granthenke > >> > > >> > > > > > > > > -- > > Grant Henke > > Solutions Consultant | Cloudera > > ghe...@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke >