I absolutely disagree with #2, Neha. That will break a lot of infrastructure within LinkedIn. That said, removing "." might break other people as well, but I think we should have a clearer idea of how much usage there is on either side.
-Todd On Fri, Jul 10, 2015 at 2:08 PM, Neha Narkhede <n...@confluent.io> wrote: > "." seems natural for grouping topic names. +1 for 2) going forward only > without breaking previously created topics with "_" though that might > require us to patch the code somewhat awkwardly till we phase it out a > couple (purposely left vague to stay out of Ewen's wrath :-)) versions > later. > > On Fri, Jul 10, 2015 at 2:02 PM, Gwen Shapira <gshap...@cloudera.com> > wrote: > > > I don't think we should break existing topics. Just disallow new > > topics going forward. > > > > Agree that having both is horrible, but we should have a solution that > > fails when you run "kafka_topics.sh --create", not when you configure > > Ganglia. > > > > Gwen > > > > On Fri, Jul 10, 2015 at 1:53 PM, Jay Kreps <j...@confluent.io> wrote: > > > Unfortunately '.' is pretty common too. I agree that it is perverse, > but > > > people seem to do it. Breaking all the topics with '.' in the name > seems > > > like it could be worse than combining metrics for people who have a > > > 'foo_bar' AND 'foo.bar' (and after all, having both is DEEPLY perverse, > > > no?). > > > > > > Where is our Dean of Compatibility, Ewen, on this? > > > > > > -Jay > > > > > > On Fri, Jul 10, 2015 at 1:32 PM, Todd Palino <tpal...@gmail.com> > wrote: > > > > > >> My selfish point of view is that we do #1, as we use "_" extensively > in > > >> topic names here :) I also happen to think it's the right choice, > > >> specifically because "." has more special meanings, as you noted. > > >> > > >> -Todd > > >> > > >> > > >> On Fri, Jul 10, 2015 at 1:30 PM, Gwen Shapira <gshap...@cloudera.com> > > >> wrote: > > >> > > >> > Unintentional side effect from allowing IP addresses in consumer > > client > > >> > IDs :) > > >> > > > >> > So the question is, what do we do now? > > >> > > > >> > 1) disallow "." > > >> > 2) disallow "_" > > >> > 3) find a reversible way to encode "." and "_" that won't break > > existing > > >> > metrics > > >> > 4) all of the above? > > >> > > > >> > btw. it looks like "." and ".." are currently valid. Topic names are > > >> > used for directories, right? this sounds like fun :) > > >> > > > >> > I vote for option #1, although if someone has a good idea for #3 it > > >> > will be even better. > > >> > > > >> > Gwen > > >> > > > >> > > > >> > > > >> > On Fri, Jul 10, 2015 at 1:22 PM, Grant Henke <ghe...@cloudera.com> > > >> wrote: > > >> > > Found it was added here: > > >> https://issues.apache.org/jira/browse/KAFKA-697 > > >> > > > > >> > > On Fri, Jul 10, 2015 at 3:18 PM, Todd Palino <tpal...@gmail.com> > > >> wrote: > > >> > > > > >> > >> This was definitely changed at some point after KAFKA-495. The > > >> question > > >> > is > > >> > >> when and why. > > >> > >> > > >> > >> Here's the relevant code from that patch: > > >> > >> > > >> > >> > =================================================================== > > >> > >> --- core/src/main/scala/kafka/utils/Topic.scala (revision > 1390178) > > >> > >> +++ core/src/main/scala/kafka/utils/Topic.scala (working copy) > > >> > >> @@ -21,24 +21,21 @@ > > >> > >> import util.matching.Regex > > >> > >> > > >> > >> object Topic { > > >> > >> + val legalChars = "[a-zA-Z0-9_-]" > > >> > >> > > >> > >> > > >> > >> > > >> > >> -Todd > > >> > >> > > >> > >> > > >> > >> On Fri, Jul 10, 2015 at 1:02 PM, Grant Henke < > ghe...@cloudera.com> > > >> > wrote: > > >> > >> > > >> > >> > kafka.common.Topic shows that currently period is a valid > > character > > >> > and I > > >> > >> > have verified I can use kafka-topics.sh to create a new topic > > with a > > >> > >> > period. > > >> > >> > > > >> > >> > > > >> > >> > AdminUtils.createOrUpdateTopicPartitionAssignmentPathInZK > > currently > > >> > uses > > >> > >> > Topic.validate before writing to Zookeeper. > > >> > >> > > > >> > >> > Should period character support be removed? I was under the > same > > >> > >> impression > > >> > >> > as Gwen, that a period was used by many as a way to "group" > > topics. > > >> > >> > > > >> > >> > The code is pasted below since its small: > > >> > >> > > > >> > >> > object Topic { > > >> > >> > val legalChars = "[a-zA-Z0-9\\._\\-]" > > >> > >> > private val maxNameLength = 255 > > >> > >> > private val rgx = new Regex(legalChars + "+") > > >> > >> > > > >> > >> > val InternalTopics = Set(OffsetManager.OffsetsTopicName) > > >> > >> > > > >> > >> > def validate(topic: String) { > > >> > >> > if (topic.length <= 0) > > >> > >> > throw new InvalidTopicException("topic name is illegal, > > can't > > >> be > > >> > >> > empty") > > >> > >> > else if (topic.equals(".") || topic.equals("..")) > > >> > >> > throw new InvalidTopicException("topic name cannot be > > \".\" or > > >> > >> > \"..\"") > > >> > >> > else if (topic.length > maxNameLength) > > >> > >> > throw new InvalidTopicException("topic name is illegal, > > can't > > >> be > > >> > >> > longer than " + maxNameLength + " characters") > > >> > >> > > > >> > >> > rgx.findFirstIn(topic) match { > > >> > >> > case Some(t) => > > >> > >> > if (!t.equals(topic)) > > >> > >> > throw new InvalidTopicException("topic name " + topic > > + " > > >> is > > >> > >> > illegal, contains a character other than ASCII alphanumerics, > > '.', > > >> '_' > > >> > >> and > > >> > >> > '-'") > > >> > >> > case None => throw new InvalidTopicException("topic name > " > > + > > >> > topic > > >> > >> + > > >> > >> > " is illegal, contains a character other than ASCII > > alphanumerics, > > >> > '.', > > >> > >> > '_' and '-'") > > >> > >> > } > > >> > >> > } > > >> > >> > } > > >> > >> > > > >> > >> > On Fri, Jul 10, 2015 at 2:50 PM, Todd Palino < > tpal...@gmail.com> > > >> > wrote: > > >> > >> > > > >> > >> > > I had to go look this one up again to make sure - > > >> > >> > > https://issues.apache.org/jira/browse/KAFKA-495 > > >> > >> > > > > >> > >> > > The only valid character names for topics are alphanumeric, > > >> > underscore, > > >> > >> > and > > >> > >> > > dash. A period is not supposed to be a valid character to > use. > > If > > >> > >> you're > > >> > >> > > seeing them, then one of two things have happened: > > >> > >> > > > > >> > >> > > 1) You have topic names that are grandfathered in from before > > that > > >> > >> patch > > >> > >> > > 2) The patch is not working properly and there is somewhere > in > > the > > >> > >> broker > > >> > >> > > that the standard is not being enforced. > > >> > >> > > > > >> > >> > > -Todd > > >> > >> > > > > >> > >> > > > > >> > >> > > On Fri, Jul 10, 2015 at 12:13 PM, Brock Noland < > > br...@apache.org> > > >> > >> wrote: > > >> > >> > > > > >> > >> > > > On Fri, Jul 10, 2015 at 11:34 AM, Gwen Shapira < > > >> > >> gshap...@cloudera.com> > > >> > >> > > > wrote: > > >> > >> > > > > Hi Kafka Fans, > > >> > >> > > > > > > >> > >> > > > > If you have one topic named "kafka_lab_2" and the other > > named > > >> > >> > > > > "kafka.lab.2", the topic level metrics will be named > > >> kafka_lab_2 > > >> > >> for > > >> > >> > > > > both, effectively making it impossible to monitor them > > >> properly. > > >> > >> > > > > > > >> > >> > > > > The reason this happens is that using "." in topic names > is > > >> > pretty > > >> > >> > > > > common, especially as a way to group topics into data > > centers, > > >> > >> > > > > relevant apps, etc - basically a work-around to our > current > > >> > lack of > > >> > >> > > > > name spaces. However, most metric monitoring systems > using > > "." > > >> > to > > >> > >> > > > > annotate hierarchy, so to avoid issues around metric > names, > > >> > Kafka > > >> > >> > > > > replaces the "." in the name with an underscore. > > >> > >> > > > > > > >> > >> > > > > This generates good metric names, but creates the problem > > with > > >> > name > > >> > >> > > > collisions. > > >> > >> > > > > > > >> > >> > > > > I'm wondering if it makes sense to simply limit the range > > of > > >> > >> > > > > characters permitted in a topic name and disallow "_"? > > >> Obviously > > >> > >> > > > > existing topics will need to remain as is, which is a bit > > >> > awkward. > > >> > >> > > > > > >> > >> > > > Interesting problem! Many if not most users I personally am > > >> aware > > >> > of > > >> > >> > > > use "_" as a separator in topic names. I am sure that many > > users > > >> > >> would > > >> > >> > > > be quite surprised by this limitation. With that said, I am > > sure > > >> > >> > > > they'd transition accordingly. > > >> > >> > > > > > >> > >> > > > > > > >> > >> > > > > If anyone has better backward-compatible solutions to > this, > > >> I'm > > >> > all > > >> > >> > > ears > > >> > >> > > > :) > > >> > >> > > > > > > >> > >> > > > > Gwen > > >> > >> > > > > > >> > >> > > > > >> > >> > > > >> > >> > > > >> > >> > > > >> > >> > -- > > >> > >> > Grant Henke > > >> > >> > Solutions Consultant | Cloudera > > >> > >> > ghe...@cloudera.com | twitter.com/gchenke | > > >> > linkedin.com/in/granthenke > > >> > >> > > > >> > >> > > >> > > > > >> > > > > >> > > > > >> > > -- > > >> > > Grant Henke > > >> > > Solutions Consultant | Cloudera > > >> > > ghe...@cloudera.com | twitter.com/gchenke | > > linkedin.com/in/granthenke > > >> > > > >> > > > > > > -- > Thanks, > Neha >