Thanks, Grant. That seems like a bad solution to the problem that John ran into in that ticket. It's entirely reasonable to have separate validators for separate things, but it seems like the choice was made to try and mash it all into a single validator. And it appears that despite the commentary in the ticket at the time, Gwen's identified a very good reason to be restrictive about topic naming.
-Todd On Fri, Jul 10, 2015 at 1:22 PM, Grant Henke <ghe...@cloudera.com> wrote: > Found it was added here: https://issues.apache.org/jira/browse/KAFKA-697 > > On Fri, Jul 10, 2015 at 3:18 PM, Todd Palino <tpal...@gmail.com> wrote: > > > This was definitely changed at some point after KAFKA-495. The question > is > > when and why. > > > > Here's the relevant code from that patch: > > > > =================================================================== > > --- core/src/main/scala/kafka/utils/Topic.scala (revision 1390178) > > +++ core/src/main/scala/kafka/utils/Topic.scala (working copy) > > @@ -21,24 +21,21 @@ > > import util.matching.Regex > > > > object Topic { > > + val legalChars = "[a-zA-Z0-9_-]" > > > > > > > > -Todd > > > > > > On Fri, Jul 10, 2015 at 1:02 PM, Grant Henke <ghe...@cloudera.com> > wrote: > > > > > kafka.common.Topic shows that currently period is a valid character > and I > > > have verified I can use kafka-topics.sh to create a new topic with a > > > period. > > > > > > > > > AdminUtils.createOrUpdateTopicPartitionAssignmentPathInZK currently > uses > > > Topic.validate before writing to Zookeeper. > > > > > > Should period character support be removed? I was under the same > > impression > > > as Gwen, that a period was used by many as a way to "group" topics. > > > > > > The code is pasted below since its small: > > > > > > object Topic { > > > val legalChars = "[a-zA-Z0-9\\._\\-]" > > > private val maxNameLength = 255 > > > private val rgx = new Regex(legalChars + "+") > > > > > > val InternalTopics = Set(OffsetManager.OffsetsTopicName) > > > > > > def validate(topic: String) { > > > if (topic.length <= 0) > > > throw new InvalidTopicException("topic name is illegal, can't be > > > empty") > > > else if (topic.equals(".") || topic.equals("..")) > > > throw new InvalidTopicException("topic name cannot be \".\" or > > > \"..\"") > > > else if (topic.length > maxNameLength) > > > throw new InvalidTopicException("topic name is illegal, can't be > > > longer than " + maxNameLength + " characters") > > > > > > rgx.findFirstIn(topic) match { > > > case Some(t) => > > > if (!t.equals(topic)) > > > throw new InvalidTopicException("topic name " + topic + " is > > > illegal, contains a character other than ASCII alphanumerics, '.', '_' > > and > > > '-'") > > > case None => throw new InvalidTopicException("topic name " + > topic > > + > > > " is illegal, contains a character other than ASCII alphanumerics, > '.', > > > '_' and '-'") > > > } > > > } > > > } > > > > > > On Fri, Jul 10, 2015 at 2:50 PM, Todd Palino <tpal...@gmail.com> > wrote: > > > > > > > I had to go look this one up again to make sure - > > > > https://issues.apache.org/jira/browse/KAFKA-495 > > > > > > > > The only valid character names for topics are alphanumeric, > underscore, > > > and > > > > dash. A period is not supposed to be a valid character to use. If > > you're > > > > seeing them, then one of two things have happened: > > > > > > > > 1) You have topic names that are grandfathered in from before that > > patch > > > > 2) The patch is not working properly and there is somewhere in the > > broker > > > > that the standard is not being enforced. > > > > > > > > -Todd > > > > > > > > > > > > On Fri, Jul 10, 2015 at 12:13 PM, Brock Noland <br...@apache.org> > > wrote: > > > > > > > > > On Fri, Jul 10, 2015 at 11:34 AM, Gwen Shapira < > > gshap...@cloudera.com> > > > > > wrote: > > > > > > Hi Kafka Fans, > > > > > > > > > > > > If you have one topic named "kafka_lab_2" and the other named > > > > > > "kafka.lab.2", the topic level metrics will be named kafka_lab_2 > > for > > > > > > both, effectively making it impossible to monitor them properly. > > > > > > > > > > > > The reason this happens is that using "." in topic names is > pretty > > > > > > common, especially as a way to group topics into data centers, > > > > > > relevant apps, etc - basically a work-around to our current lack > of > > > > > > name spaces. However, most metric monitoring systems using "." to > > > > > > annotate hierarchy, so to avoid issues around metric names, Kafka > > > > > > replaces the "." in the name with an underscore. > > > > > > > > > > > > This generates good metric names, but creates the problem with > name > > > > > collisions. > > > > > > > > > > > > I'm wondering if it makes sense to simply limit the range of > > > > > > characters permitted in a topic name and disallow "_"? Obviously > > > > > > existing topics will need to remain as is, which is a bit > awkward. > > > > > > > > > > Interesting problem! Many if not most users I personally am aware > of > > > > > use "_" as a separator in topic names. I am sure that many users > > would > > > > > be quite surprised by this limitation. With that said, I am sure > > > > > they'd transition accordingly. > > > > > > > > > > > > > > > > > If anyone has better backward-compatible solutions to this, I'm > all > > > > ears > > > > > :) > > > > > > > > > > > > Gwen > > > > > > > > > > > > > > > > > > > > > -- > > > Grant Henke > > > Solutions Consultant | Cloudera > > > ghe...@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke > > > > > > > > > -- > Grant Henke > Solutions Consultant | Cloudera > ghe...@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke >