Yeah, I have an actual customer who ran into this. Unfortunately, inconsistencies in the way things are named are pretty common - just look at Kafka's many CLI options.
I don't think that supporting both and pointing at the docs with "I told you so" when our metrics break is a good solution. On Fri, Jul 10, 2015 at 4:33 PM, Ewen Cheslack-Postava <e...@confluent.io> wrote: > I figure you'll probably see complaints no matter what change you make. > Gwen, given that you raised this, another important question might be how > many people you see using *both*. I'm guessing this question came up > because you actually saw a conflict? But I'd imagine (or at least hope) > that most organizations are mostly consistent about naming topics -- they > standardize on one or the other. > > Since there's no "right" way to name them, I'd just leave it supporting > both and document the potential conflict in metrics. And if people use both > naming schemes, they probably deserve to suffer for their inconsistency :) > > -Ewen > > On Fri, Jul 10, 2015 at 3:28 PM, Gwen Shapira <gshap...@cloudera.com> wrote: > >> I find dots more common in my customer base, so I will definitely feel >> the pain of removing them. >> >> However, "." are already used in metrics, file names, directories, etc >> - so if we keep the dots, we need to keep code that translates them >> and document the translation. Just banning "." seems more natural. >> Also, as Grant mentioned, we'll probably have our own special usage >> for "." down the line. >> >> On Fri, Jul 10, 2015 at 2:12 PM, Todd Palino <tpal...@gmail.com> wrote: >> > I absolutely disagree with #2, Neha. That will break a lot of >> > infrastructure within LinkedIn. That said, removing "." might break other >> > people as well, but I think we should have a clearer idea of how much >> usage >> > there is on either side. >> > >> > -Todd >> > >> > >> > On Fri, Jul 10, 2015 at 2:08 PM, Neha Narkhede <n...@confluent.io> >> wrote: >> > >> >> "." seems natural for grouping topic names. +1 for 2) going forward only >> >> without breaking previously created topics with "_" though that might >> >> require us to patch the code somewhat awkwardly till we phase it out a >> >> couple (purposely left vague to stay out of Ewen's wrath :-)) versions >> >> later. >> >> >> >> On Fri, Jul 10, 2015 at 2:02 PM, Gwen Shapira <gshap...@cloudera.com> >> >> wrote: >> >> >> >> > I don't think we should break existing topics. Just disallow new >> >> > topics going forward. >> >> > >> >> > Agree that having both is horrible, but we should have a solution that >> >> > fails when you run "kafka_topics.sh --create", not when you configure >> >> > Ganglia. >> >> > >> >> > Gwen >> >> > >> >> > On Fri, Jul 10, 2015 at 1:53 PM, Jay Kreps <j...@confluent.io> wrote: >> >> > > Unfortunately '.' is pretty common too. I agree that it is perverse, >> >> but >> >> > > people seem to do it. Breaking all the topics with '.' in the name >> >> seems >> >> > > like it could be worse than combining metrics for people who have a >> >> > > 'foo_bar' AND 'foo.bar' (and after all, having both is DEEPLY >> perverse, >> >> > > no?). >> >> > > >> >> > > Where is our Dean of Compatibility, Ewen, on this? >> >> > > >> >> > > -Jay >> >> > > >> >> > > On Fri, Jul 10, 2015 at 1:32 PM, Todd Palino <tpal...@gmail.com> >> >> wrote: >> >> > > >> >> > >> My selfish point of view is that we do #1, as we use "_" >> extensively >> >> in >> >> > >> topic names here :) I also happen to think it's the right choice, >> >> > >> specifically because "." has more special meanings, as you noted. >> >> > >> >> >> > >> -Todd >> >> > >> >> >> > >> >> >> > >> On Fri, Jul 10, 2015 at 1:30 PM, Gwen Shapira < >> gshap...@cloudera.com> >> >> > >> wrote: >> >> > >> >> >> > >> > Unintentional side effect from allowing IP addresses in consumer >> >> > client >> >> > >> > IDs :) >> >> > >> > >> >> > >> > So the question is, what do we do now? >> >> > >> > >> >> > >> > 1) disallow "." >> >> > >> > 2) disallow "_" >> >> > >> > 3) find a reversible way to encode "." and "_" that won't break >> >> > existing >> >> > >> > metrics >> >> > >> > 4) all of the above? >> >> > >> > >> >> > >> > btw. it looks like "." and ".." are currently valid. Topic names >> are >> >> > >> > used for directories, right? this sounds like fun :) >> >> > >> > >> >> > >> > I vote for option #1, although if someone has a good idea for #3 >> it >> >> > >> > will be even better. >> >> > >> > >> >> > >> > Gwen >> >> > >> > >> >> > >> > >> >> > >> > >> >> > >> > On Fri, Jul 10, 2015 at 1:22 PM, Grant Henke < >> ghe...@cloudera.com> >> >> > >> wrote: >> >> > >> > > Found it was added here: >> >> > >> https://issues.apache.org/jira/browse/KAFKA-697 >> >> > >> > > >> >> > >> > > On Fri, Jul 10, 2015 at 3:18 PM, Todd Palino < >> tpal...@gmail.com> >> >> > >> wrote: >> >> > >> > > >> >> > >> > >> This was definitely changed at some point after KAFKA-495. The >> >> > >> question >> >> > >> > is >> >> > >> > >> when and why. >> >> > >> > >> >> >> > >> > >> Here's the relevant code from that patch: >> >> > >> > >> >> >> > >> > >> >> >> =================================================================== >> >> > >> > >> --- core/src/main/scala/kafka/utils/Topic.scala (revision >> >> 1390178) >> >> > >> > >> +++ core/src/main/scala/kafka/utils/Topic.scala (working copy) >> >> > >> > >> @@ -21,24 +21,21 @@ >> >> > >> > >> import util.matching.Regex >> >> > >> > >> >> >> > >> > >> object Topic { >> >> > >> > >> + val legalChars = "[a-zA-Z0-9_-]" >> >> > >> > >> >> >> > >> > >> >> >> > >> > >> >> >> > >> > >> -Todd >> >> > >> > >> >> >> > >> > >> >> >> > >> > >> On Fri, Jul 10, 2015 at 1:02 PM, Grant Henke < >> >> ghe...@cloudera.com> >> >> > >> > wrote: >> >> > >> > >> >> >> > >> > >> > kafka.common.Topic shows that currently period is a valid >> >> > character >> >> > >> > and I >> >> > >> > >> > have verified I can use kafka-topics.sh to create a new >> topic >> >> > with a >> >> > >> > >> > period. >> >> > >> > >> > >> >> > >> > >> > >> >> > >> > >> > AdminUtils.createOrUpdateTopicPartitionAssignmentPathInZK >> >> > currently >> >> > >> > uses >> >> > >> > >> > Topic.validate before writing to Zookeeper. >> >> > >> > >> > >> >> > >> > >> > Should period character support be removed? I was under the >> >> same >> >> > >> > >> impression >> >> > >> > >> > as Gwen, that a period was used by many as a way to "group" >> >> > topics. >> >> > >> > >> > >> >> > >> > >> > The code is pasted below since its small: >> >> > >> > >> > >> >> > >> > >> > object Topic { >> >> > >> > >> > val legalChars = "[a-zA-Z0-9\\._\\-]" >> >> > >> > >> > private val maxNameLength = 255 >> >> > >> > >> > private val rgx = new Regex(legalChars + "+") >> >> > >> > >> > >> >> > >> > >> > val InternalTopics = Set(OffsetManager.OffsetsTopicName) >> >> > >> > >> > >> >> > >> > >> > def validate(topic: String) { >> >> > >> > >> > if (topic.length <= 0) >> >> > >> > >> > throw new InvalidTopicException("topic name is >> illegal, >> >> > can't >> >> > >> be >> >> > >> > >> > empty") >> >> > >> > >> > else if (topic.equals(".") || topic.equals("..")) >> >> > >> > >> > throw new InvalidTopicException("topic name cannot be >> >> > \".\" or >> >> > >> > >> > \"..\"") >> >> > >> > >> > else if (topic.length > maxNameLength) >> >> > >> > >> > throw new InvalidTopicException("topic name is >> illegal, >> >> > can't >> >> > >> be >> >> > >> > >> > longer than " + maxNameLength + " characters") >> >> > >> > >> > >> >> > >> > >> > rgx.findFirstIn(topic) match { >> >> > >> > >> > case Some(t) => >> >> > >> > >> > if (!t.equals(topic)) >> >> > >> > >> > throw new InvalidTopicException("topic name " + >> topic >> >> > + " >> >> > >> is >> >> > >> > >> > illegal, contains a character other than ASCII >> alphanumerics, >> >> > '.', >> >> > >> '_' >> >> > >> > >> and >> >> > >> > >> > '-'") >> >> > >> > >> > case None => throw new InvalidTopicException("topic >> name >> >> " >> >> > + >> >> > >> > topic >> >> > >> > >> + >> >> > >> > >> > " is illegal, contains a character other than ASCII >> >> > alphanumerics, >> >> > >> > '.', >> >> > >> > >> > '_' and '-'") >> >> > >> > >> > } >> >> > >> > >> > } >> >> > >> > >> > } >> >> > >> > >> > >> >> > >> > >> > On Fri, Jul 10, 2015 at 2:50 PM, Todd Palino < >> >> tpal...@gmail.com> >> >> > >> > wrote: >> >> > >> > >> > >> >> > >> > >> > > I had to go look this one up again to make sure - >> >> > >> > >> > > https://issues.apache.org/jira/browse/KAFKA-495 >> >> > >> > >> > > >> >> > >> > >> > > The only valid character names for topics are >> alphanumeric, >> >> > >> > underscore, >> >> > >> > >> > and >> >> > >> > >> > > dash. A period is not supposed to be a valid character to >> >> use. >> >> > If >> >> > >> > >> you're >> >> > >> > >> > > seeing them, then one of two things have happened: >> >> > >> > >> > > >> >> > >> > >> > > 1) You have topic names that are grandfathered in from >> before >> >> > that >> >> > >> > >> patch >> >> > >> > >> > > 2) The patch is not working properly and there is >> somewhere >> >> in >> >> > the >> >> > >> > >> broker >> >> > >> > >> > > that the standard is not being enforced. >> >> > >> > >> > > >> >> > >> > >> > > -Todd >> >> > >> > >> > > >> >> > >> > >> > > >> >> > >> > >> > > On Fri, Jul 10, 2015 at 12:13 PM, Brock Noland < >> >> > br...@apache.org> >> >> > >> > >> wrote: >> >> > >> > >> > > >> >> > >> > >> > > > On Fri, Jul 10, 2015 at 11:34 AM, Gwen Shapira < >> >> > >> > >> gshap...@cloudera.com> >> >> > >> > >> > > > wrote: >> >> > >> > >> > > > > Hi Kafka Fans, >> >> > >> > >> > > > > >> >> > >> > >> > > > > If you have one topic named "kafka_lab_2" and the >> other >> >> > named >> >> > >> > >> > > > > "kafka.lab.2", the topic level metrics will be named >> >> > >> kafka_lab_2 >> >> > >> > >> for >> >> > >> > >> > > > > both, effectively making it impossible to monitor them >> >> > >> properly. >> >> > >> > >> > > > > >> >> > >> > >> > > > > The reason this happens is that using "." in topic >> names >> >> is >> >> > >> > pretty >> >> > >> > >> > > > > common, especially as a way to group topics into data >> >> > centers, >> >> > >> > >> > > > > relevant apps, etc - basically a work-around to our >> >> current >> >> > >> > lack of >> >> > >> > >> > > > > name spaces. However, most metric monitoring systems >> >> using >> >> > "." >> >> > >> > to >> >> > >> > >> > > > > annotate hierarchy, so to avoid issues around metric >> >> names, >> >> > >> > Kafka >> >> > >> > >> > > > > replaces the "." in the name with an underscore. >> >> > >> > >> > > > > >> >> > >> > >> > > > > This generates good metric names, but creates the >> problem >> >> > with >> >> > >> > name >> >> > >> > >> > > > collisions. >> >> > >> > >> > > > > >> >> > >> > >> > > > > I'm wondering if it makes sense to simply limit the >> range >> >> > of >> >> > >> > >> > > > > characters permitted in a topic name and disallow "_"? >> >> > >> Obviously >> >> > >> > >> > > > > existing topics will need to remain as is, which is a >> bit >> >> > >> > awkward. >> >> > >> > >> > > > >> >> > >> > >> > > > Interesting problem! Many if not most users I >> personally am >> >> > >> aware >> >> > >> > of >> >> > >> > >> > > > use "_" as a separator in topic names. I am sure that >> many >> >> > users >> >> > >> > >> would >> >> > >> > >> > > > be quite surprised by this limitation. With that said, >> I am >> >> > sure >> >> > >> > >> > > > they'd transition accordingly. >> >> > >> > >> > > > >> >> > >> > >> > > > > >> >> > >> > >> > > > > If anyone has better backward-compatible solutions to >> >> this, >> >> > >> I'm >> >> > >> > all >> >> > >> > >> > > ears >> >> > >> > >> > > > :) >> >> > >> > >> > > > > >> >> > >> > >> > > > > Gwen >> >> > >> > >> > > > >> >> > >> > >> > > >> >> > >> > >> > >> >> > >> > >> > >> >> > >> > >> > >> >> > >> > >> > -- >> >> > >> > >> > Grant Henke >> >> > >> > >> > Solutions Consultant | Cloudera >> >> > >> > >> > ghe...@cloudera.com | twitter.com/gchenke | >> >> > >> > linkedin.com/in/granthenke >> >> > >> > >> > >> >> > >> > >> >> >> > >> > > >> >> > >> > > >> >> > >> > > >> >> > >> > > -- >> >> > >> > > Grant Henke >> >> > >> > > Solutions Consultant | Cloudera >> >> > >> > > ghe...@cloudera.com | twitter.com/gchenke | >> >> > linkedin.com/in/granthenke >> >> > >> > >> >> > >> >> >> > >> >> >> >> >> >> >> >> -- >> >> Thanks, >> >> Neha >> >> >> > > > > -- > Thanks, > Ewen