I find dots more common in my customer base, so I will definitely feel the pain of removing them.
However, "." are already used in metrics, file names, directories, etc - so if we keep the dots, we need to keep code that translates them and document the translation. Just banning "." seems more natural. Also, as Grant mentioned, we'll probably have our own special usage for "." down the line. On Fri, Jul 10, 2015 at 2:12 PM, Todd Palino <tpal...@gmail.com> wrote: > I absolutely disagree with #2, Neha. That will break a lot of > infrastructure within LinkedIn. That said, removing "." might break other > people as well, but I think we should have a clearer idea of how much usage > there is on either side. > > -Todd > > > On Fri, Jul 10, 2015 at 2:08 PM, Neha Narkhede <n...@confluent.io> wrote: > >> "." seems natural for grouping topic names. +1 for 2) going forward only >> without breaking previously created topics with "_" though that might >> require us to patch the code somewhat awkwardly till we phase it out a >> couple (purposely left vague to stay out of Ewen's wrath :-)) versions >> later. >> >> On Fri, Jul 10, 2015 at 2:02 PM, Gwen Shapira <gshap...@cloudera.com> >> wrote: >> >> > I don't think we should break existing topics. Just disallow new >> > topics going forward. >> > >> > Agree that having both is horrible, but we should have a solution that >> > fails when you run "kafka_topics.sh --create", not when you configure >> > Ganglia. >> > >> > Gwen >> > >> > On Fri, Jul 10, 2015 at 1:53 PM, Jay Kreps <j...@confluent.io> wrote: >> > > Unfortunately '.' is pretty common too. I agree that it is perverse, >> but >> > > people seem to do it. Breaking all the topics with '.' in the name >> seems >> > > like it could be worse than combining metrics for people who have a >> > > 'foo_bar' AND 'foo.bar' (and after all, having both is DEEPLY perverse, >> > > no?). >> > > >> > > Where is our Dean of Compatibility, Ewen, on this? >> > > >> > > -Jay >> > > >> > > On Fri, Jul 10, 2015 at 1:32 PM, Todd Palino <tpal...@gmail.com> >> wrote: >> > > >> > >> My selfish point of view is that we do #1, as we use "_" extensively >> in >> > >> topic names here :) I also happen to think it's the right choice, >> > >> specifically because "." has more special meanings, as you noted. >> > >> >> > >> -Todd >> > >> >> > >> >> > >> On Fri, Jul 10, 2015 at 1:30 PM, Gwen Shapira <gshap...@cloudera.com> >> > >> wrote: >> > >> >> > >> > Unintentional side effect from allowing IP addresses in consumer >> > client >> > >> > IDs :) >> > >> > >> > >> > So the question is, what do we do now? >> > >> > >> > >> > 1) disallow "." >> > >> > 2) disallow "_" >> > >> > 3) find a reversible way to encode "." and "_" that won't break >> > existing >> > >> > metrics >> > >> > 4) all of the above? >> > >> > >> > >> > btw. it looks like "." and ".." are currently valid. Topic names are >> > >> > used for directories, right? this sounds like fun :) >> > >> > >> > >> > I vote for option #1, although if someone has a good idea for #3 it >> > >> > will be even better. >> > >> > >> > >> > Gwen >> > >> > >> > >> > >> > >> > >> > >> > On Fri, Jul 10, 2015 at 1:22 PM, Grant Henke <ghe...@cloudera.com> >> > >> wrote: >> > >> > > Found it was added here: >> > >> https://issues.apache.org/jira/browse/KAFKA-697 >> > >> > > >> > >> > > On Fri, Jul 10, 2015 at 3:18 PM, Todd Palino <tpal...@gmail.com> >> > >> wrote: >> > >> > > >> > >> > >> This was definitely changed at some point after KAFKA-495. The >> > >> question >> > >> > is >> > >> > >> when and why. >> > >> > >> >> > >> > >> Here's the relevant code from that patch: >> > >> > >> >> > >> > >> >> =================================================================== >> > >> > >> --- core/src/main/scala/kafka/utils/Topic.scala (revision >> 1390178) >> > >> > >> +++ core/src/main/scala/kafka/utils/Topic.scala (working copy) >> > >> > >> @@ -21,24 +21,21 @@ >> > >> > >> import util.matching.Regex >> > >> > >> >> > >> > >> object Topic { >> > >> > >> + val legalChars = "[a-zA-Z0-9_-]" >> > >> > >> >> > >> > >> >> > >> > >> >> > >> > >> -Todd >> > >> > >> >> > >> > >> >> > >> > >> On Fri, Jul 10, 2015 at 1:02 PM, Grant Henke < >> ghe...@cloudera.com> >> > >> > wrote: >> > >> > >> >> > >> > >> > kafka.common.Topic shows that currently period is a valid >> > character >> > >> > and I >> > >> > >> > have verified I can use kafka-topics.sh to create a new topic >> > with a >> > >> > >> > period. >> > >> > >> > >> > >> > >> > >> > >> > >> > AdminUtils.createOrUpdateTopicPartitionAssignmentPathInZK >> > currently >> > >> > uses >> > >> > >> > Topic.validate before writing to Zookeeper. >> > >> > >> > >> > >> > >> > Should period character support be removed? I was under the >> same >> > >> > >> impression >> > >> > >> > as Gwen, that a period was used by many as a way to "group" >> > topics. >> > >> > >> > >> > >> > >> > The code is pasted below since its small: >> > >> > >> > >> > >> > >> > object Topic { >> > >> > >> > val legalChars = "[a-zA-Z0-9\\._\\-]" >> > >> > >> > private val maxNameLength = 255 >> > >> > >> > private val rgx = new Regex(legalChars + "+") >> > >> > >> > >> > >> > >> > val InternalTopics = Set(OffsetManager.OffsetsTopicName) >> > >> > >> > >> > >> > >> > def validate(topic: String) { >> > >> > >> > if (topic.length <= 0) >> > >> > >> > throw new InvalidTopicException("topic name is illegal, >> > can't >> > >> be >> > >> > >> > empty") >> > >> > >> > else if (topic.equals(".") || topic.equals("..")) >> > >> > >> > throw new InvalidTopicException("topic name cannot be >> > \".\" or >> > >> > >> > \"..\"") >> > >> > >> > else if (topic.length > maxNameLength) >> > >> > >> > throw new InvalidTopicException("topic name is illegal, >> > can't >> > >> be >> > >> > >> > longer than " + maxNameLength + " characters") >> > >> > >> > >> > >> > >> > rgx.findFirstIn(topic) match { >> > >> > >> > case Some(t) => >> > >> > >> > if (!t.equals(topic)) >> > >> > >> > throw new InvalidTopicException("topic name " + topic >> > + " >> > >> is >> > >> > >> > illegal, contains a character other than ASCII alphanumerics, >> > '.', >> > >> '_' >> > >> > >> and >> > >> > >> > '-'") >> > >> > >> > case None => throw new InvalidTopicException("topic name >> " >> > + >> > >> > topic >> > >> > >> + >> > >> > >> > " is illegal, contains a character other than ASCII >> > alphanumerics, >> > >> > '.', >> > >> > >> > '_' and '-'") >> > >> > >> > } >> > >> > >> > } >> > >> > >> > } >> > >> > >> > >> > >> > >> > On Fri, Jul 10, 2015 at 2:50 PM, Todd Palino < >> tpal...@gmail.com> >> > >> > wrote: >> > >> > >> > >> > >> > >> > > I had to go look this one up again to make sure - >> > >> > >> > > https://issues.apache.org/jira/browse/KAFKA-495 >> > >> > >> > > >> > >> > >> > > The only valid character names for topics are alphanumeric, >> > >> > underscore, >> > >> > >> > and >> > >> > >> > > dash. A period is not supposed to be a valid character to >> use. >> > If >> > >> > >> you're >> > >> > >> > > seeing them, then one of two things have happened: >> > >> > >> > > >> > >> > >> > > 1) You have topic names that are grandfathered in from before >> > that >> > >> > >> patch >> > >> > >> > > 2) The patch is not working properly and there is somewhere >> in >> > the >> > >> > >> broker >> > >> > >> > > that the standard is not being enforced. >> > >> > >> > > >> > >> > >> > > -Todd >> > >> > >> > > >> > >> > >> > > >> > >> > >> > > On Fri, Jul 10, 2015 at 12:13 PM, Brock Noland < >> > br...@apache.org> >> > >> > >> wrote: >> > >> > >> > > >> > >> > >> > > > On Fri, Jul 10, 2015 at 11:34 AM, Gwen Shapira < >> > >> > >> gshap...@cloudera.com> >> > >> > >> > > > wrote: >> > >> > >> > > > > Hi Kafka Fans, >> > >> > >> > > > > >> > >> > >> > > > > If you have one topic named "kafka_lab_2" and the other >> > named >> > >> > >> > > > > "kafka.lab.2", the topic level metrics will be named >> > >> kafka_lab_2 >> > >> > >> for >> > >> > >> > > > > both, effectively making it impossible to monitor them >> > >> properly. >> > >> > >> > > > > >> > >> > >> > > > > The reason this happens is that using "." in topic names >> is >> > >> > pretty >> > >> > >> > > > > common, especially as a way to group topics into data >> > centers, >> > >> > >> > > > > relevant apps, etc - basically a work-around to our >> current >> > >> > lack of >> > >> > >> > > > > name spaces. However, most metric monitoring systems >> using >> > "." >> > >> > to >> > >> > >> > > > > annotate hierarchy, so to avoid issues around metric >> names, >> > >> > Kafka >> > >> > >> > > > > replaces the "." in the name with an underscore. >> > >> > >> > > > > >> > >> > >> > > > > This generates good metric names, but creates the problem >> > with >> > >> > name >> > >> > >> > > > collisions. >> > >> > >> > > > > >> > >> > >> > > > > I'm wondering if it makes sense to simply limit the range >> > of >> > >> > >> > > > > characters permitted in a topic name and disallow "_"? >> > >> Obviously >> > >> > >> > > > > existing topics will need to remain as is, which is a bit >> > >> > awkward. >> > >> > >> > > > >> > >> > >> > > > Interesting problem! Many if not most users I personally am >> > >> aware >> > >> > of >> > >> > >> > > > use "_" as a separator in topic names. I am sure that many >> > users >> > >> > >> would >> > >> > >> > > > be quite surprised by this limitation. With that said, I am >> > sure >> > >> > >> > > > they'd transition accordingly. >> > >> > >> > > > >> > >> > >> > > > > >> > >> > >> > > > > If anyone has better backward-compatible solutions to >> this, >> > >> I'm >> > >> > all >> > >> > >> > > ears >> > >> > >> > > > :) >> > >> > >> > > > > >> > >> > >> > > > > Gwen >> > >> > >> > > > >> > >> > >> > > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > -- >> > >> > >> > Grant Henke >> > >> > >> > Solutions Consultant | Cloudera >> > >> > >> > ghe...@cloudera.com | twitter.com/gchenke | >> > >> > linkedin.com/in/granthenke >> > >> > >> > >> > >> > >> >> > >> > > >> > >> > > >> > >> > > >> > >> > > -- >> > >> > > Grant Henke >> > >> > > Solutions Consultant | Cloudera >> > >> > > ghe...@cloudera.com | twitter.com/gchenke | >> > linkedin.com/in/granthenke >> > >> > >> > >> >> > >> >> >> >> -- >> Thanks, >> Neha >>