Hi, since dots seem to be a problem on the metrics side, why not let the metrics side handle it by escaping troublesome characters? E.g. "foo.my\.topic.feh" Let's not push the problem upstream.
Replacing "." with another set of allowed characters "__" seems like a bad idea since it is ambigious: "__consumer_offsets" == ".consumer_offsets"? I'm guessing the same problem arises if broker names are part of the metrics name, e.g., "broker.192.168.0.2.rxbytes", do we want to push the exclusion of dots in IP addresses upstream as well? :) Magnus 2015-07-13 2:06 GMT+02:00 Jun Rao <j...@confluent.io>: > First, a couple of clarifications on this. > > 1. Currently, we allow Kafka topic to have dots, except that we disallow > topic names that are exactly "." or ".." (which can cause weird problems > when mapping to file directories and ZK paths as Gwen pointed out). > > 2. When creating the Coda Hale metrics, currently, we only replace dot with > _ in the scope of the metric name. The actually jmx bean name still > preserves dot. This is because the Graphite reporter uses scope when > forming the metric names and assumes dots are component separators (see > KAFKA-1902 for details). So, if one uses tools like jmxtrans to export the > metrics from the mbeans directly, the original topic name is preserved. > However, I am not sure how well this maps to Graphite. We thought about > making the replacing character configurable. However, the difficulty is > that the logic of doing the replacement is in a singleton > class KafkaMetricsGroup and I am not sure if we can pass in an external > config. > > Given the above, I'd suggest that customer try the jmxtrans to Graphite > path and see if that helps. I agree that it's too disruptive to restrict > the current topic naming convention. > > Also, since we plan to replace Coda Hale metrics with Kafka metrics in the > future, we can try to address this issue better then. > > Thanks, > > Jun > > > > > On Sun, Jul 12, 2015 at 10:26 AM, Gwen Shapira <gshap...@cloudera.com> > wrote: > > > I like the "lets warn people of conflicts when creating the topic" > > suggestion. IMO, automatic topic creation as currently done is buggy > > either way (Send data and hope the topic is ready before retries run > > out, potentially failing with the super helpful NO_LEADER error), so I > > don't mind leaving it broken a bit more. I think the right behavior is > > that conflicts will cause auto creating to fail, the same way we > > currently do when the default number of replicas is higher than number > > of brokers. > > > > One thing that is left confusing is that people in the "." camp need > > to know about the conversion or they will fail to find their topics in > > their monitoring tools. Not very nice to them, but I can't think of > > alternatives. > > > > I'll start with the doc patch :) > > > > On Sat, Jul 11, 2015 at 12:54 AM, Ewen Cheslack-Postava > > <e...@confluent.io> wrote: > > > On Fri, Jul 10, 2015 at 4:41 PM, Gwen Shapira <gshap...@cloudera.com> > > wrote: > > > > > >> Yeah, I have an actual customer who ran into this. Unfortunately, > > >> inconsistencies in the way things are named are pretty common - just > > >> look at Kafka's many CLI options. > > >> > > >> I don't think that supporting both and pointing at the docs with "I > > >> told you so" when our metrics break is a good solution. > > >> > > > > > > I agree, especially since we don't *already* have something in the docs > > > indicating this will be an issue. I was flippant about the situation > > > because I *wish* there was more careful consideration + naming policy > in > > > place, but I realize that doesn't always happen in practice. I guess I > > need > > > to take Compatibility Czar more seriously :) > > > > > > I see think the obvious practical options are as follows: > > > > > > 1. Kill support for "_". Piss off the entire set of people who > currently > > > use "_" anywhere in topic names. > > > 2. Kill support for ".". Piss off the entire set of people who > currently > > > use "." anywhere in topic names. > > > 3. Tell people they need to be careful about this issue. Piss off the > set > > > of people who use both "_" and "." *and* happen to have conflicting > topic > > > names. They will have some pain when they discover the issue and have > to > > > figure out how to move one of those topics over to a non-conflicting > > name. > > > I'm going to claim that this group must be an *extremely* small > fraction > > of > > > users, which doesn't make it better to allow things to break for them, > > but > > > at least gives us an idea of the scale of impact. > > > > > > (One other alternative suggested earlier was encoding metric names to > > > account for differences; given the metric renaming mess in the last > > > release, I'm extremely hesitant to suggest anything of the sort...) > > > > > > None of the options are ideal, but to me, 3 seems like the least > painful. > > > Both for us, and for the vast majority of users. It seems to me that > the > > > number of users that would complain about (1) or (2) drastically > outweigh > > > (3). > > > > > > At this point, I don't think it's practical to keep switching the rules > > > about which characters are allowed and which aren't because the > previous > > > attempts haven't been successful -- it seems the rules have changed > > > multiple times, whether intentionally or accidentally, such that any > more > > > changes will cause problems. At this point, I think we just need to > > accept > > > being liberal in accepting the range of topic names that have been > > > permitted so far and make the best of the situation, even if it means > > only > > > being able to warn people of conflicts. > > > > > > Here's another alternative: how about being liberal with topic name > > > characters, but upon topic creation we convert the name to the metric > > name > > > and fail if there's a conflict with another topic? This is relatively > > > expensive (requires getting the metric name of all other topics), but > it > > > avoids the bad situation we're encountering here (conflicting metrics), > > > avoids getting into a persistent conflict (we kill topic creation when > we > > > detect the issue rather than noticing it when the metrics conflict > > > happens), and keeps the vast majority of existing users happy (both _ > > and . > > > work in topic names as long as you don't create topics with conflicting > > > metric names). > > > > > > There are definitely details to be worked out (auto topic creation?), > but > > > it seems like a more realistic solution than to start disallowing _ or > . > > in > > > topic names. > > > > > > -Ewen > > > > > > > > >> > > >> On Fri, Jul 10, 2015 at 4:33 PM, Ewen Cheslack-Postava > > >> <e...@confluent.io> wrote: > > >> > I figure you'll probably see complaints no matter what change you > > make. > > >> > Gwen, given that you raised this, another important question might > be > > how > > >> > many people you see using *both*. I'm guessing this question came up > > >> > because you actually saw a conflict? But I'd imagine (or at least > > hope) > > >> > that most organizations are mostly consistent about naming topics -- > > they > > >> > standardize on one or the other. > > >> > > > >> > Since there's no "right" way to name them, I'd just leave it > > supporting > > >> > both and document the potential conflict in metrics. And if people > use > > >> both > > >> > naming schemes, they probably deserve to suffer for their > > inconsistency > > >> :) > > >> > > > >> > -Ewen > > >> > > > >> > On Fri, Jul 10, 2015 at 3:28 PM, Gwen Shapira < > gshap...@cloudera.com> > > >> wrote: > > >> > > > >> >> I find dots more common in my customer base, so I will definitely > > feel > > >> >> the pain of removing them. > > >> >> > > >> >> However, "." are already used in metrics, file names, directories, > > etc > > >> >> - so if we keep the dots, we need to keep code that translates them > > >> >> and document the translation. Just banning "." seems more natural. > > >> >> Also, as Grant mentioned, we'll probably have our own special usage > > >> >> for "." down the line. > > >> >> > > >> >> On Fri, Jul 10, 2015 at 2:12 PM, Todd Palino <tpal...@gmail.com> > > wrote: > > >> >> > I absolutely disagree with #2, Neha. That will break a lot of > > >> >> > infrastructure within LinkedIn. That said, removing "." might > break > > >> other > > >> >> > people as well, but I think we should have a clearer idea of how > > much > > >> >> usage > > >> >> > there is on either side. > > >> >> > > > >> >> > -Todd > > >> >> > > > >> >> > > > >> >> > On Fri, Jul 10, 2015 at 2:08 PM, Neha Narkhede < > n...@confluent.io> > > >> >> wrote: > > >> >> > > > >> >> >> "." seems natural for grouping topic names. +1 for 2) going > > forward > > >> only > > >> >> >> without breaking previously created topics with "_" though that > > might > > >> >> >> require us to patch the code somewhat awkwardly till we phase it > > out > > >> a > > >> >> >> couple (purposely left vague to stay out of Ewen's wrath :-)) > > >> versions > > >> >> >> later. > > >> >> >> > > >> >> >> On Fri, Jul 10, 2015 at 2:02 PM, Gwen Shapira < > > gshap...@cloudera.com > > >> > > > >> >> >> wrote: > > >> >> >> > > >> >> >> > I don't think we should break existing topics. Just disallow > new > > >> >> >> > topics going forward. > > >> >> >> > > > >> >> >> > Agree that having both is horrible, but we should have a > > solution > > >> that > > >> >> >> > fails when you run "kafka_topics.sh --create", not when you > > >> configure > > >> >> >> > Ganglia. > > >> >> >> > > > >> >> >> > Gwen > > >> >> >> > > > >> >> >> > On Fri, Jul 10, 2015 at 1:53 PM, Jay Kreps <j...@confluent.io> > > >> wrote: > > >> >> >> > > Unfortunately '.' is pretty common too. I agree that it is > > >> perverse, > > >> >> >> but > > >> >> >> > > people seem to do it. Breaking all the topics with '.' in > the > > >> name > > >> >> >> seems > > >> >> >> > > like it could be worse than combining metrics for people who > > >> have a > > >> >> >> > > 'foo_bar' AND 'foo.bar' (and after all, having both is > DEEPLY > > >> >> perverse, > > >> >> >> > > no?). > > >> >> >> > > > > >> >> >> > > Where is our Dean of Compatibility, Ewen, on this? > > >> >> >> > > > > >> >> >> > > -Jay > > >> >> >> > > > > >> >> >> > > On Fri, Jul 10, 2015 at 1:32 PM, Todd Palino < > > tpal...@gmail.com> > > >> >> >> wrote: > > >> >> >> > > > > >> >> >> > >> My selfish point of view is that we do #1, as we use "_" > > >> >> extensively > > >> >> >> in > > >> >> >> > >> topic names here :) I also happen to think it's the right > > >> choice, > > >> >> >> > >> specifically because "." has more special meanings, as you > > >> noted. > > >> >> >> > >> > > >> >> >> > >> -Todd > > >> >> >> > >> > > >> >> >> > >> > > >> >> >> > >> On Fri, Jul 10, 2015 at 1:30 PM, Gwen Shapira < > > >> >> gshap...@cloudera.com> > > >> >> >> > >> wrote: > > >> >> >> > >> > > >> >> >> > >> > Unintentional side effect from allowing IP addresses in > > >> consumer > > >> >> >> > client > > >> >> >> > >> > IDs :) > > >> >> >> > >> > > > >> >> >> > >> > So the question is, what do we do now? > > >> >> >> > >> > > > >> >> >> > >> > 1) disallow "." > > >> >> >> > >> > 2) disallow "_" > > >> >> >> > >> > 3) find a reversible way to encode "." and "_" that won't > > >> break > > >> >> >> > existing > > >> >> >> > >> > metrics > > >> >> >> > >> > 4) all of the above? > > >> >> >> > >> > > > >> >> >> > >> > btw. it looks like "." and ".." are currently valid. > Topic > > >> names > > >> >> are > > >> >> >> > >> > used for directories, right? this sounds like fun :) > > >> >> >> > >> > > > >> >> >> > >> > I vote for option #1, although if someone has a good idea > > for > > >> #3 > > >> >> it > > >> >> >> > >> > will be even better. > > >> >> >> > >> > > > >> >> >> > >> > Gwen > > >> >> >> > >> > > > >> >> >> > >> > > > >> >> >> > >> > > > >> >> >> > >> > On Fri, Jul 10, 2015 at 1:22 PM, Grant Henke < > > >> >> ghe...@cloudera.com> > > >> >> >> > >> wrote: > > >> >> >> > >> > > Found it was added here: > > >> >> >> > >> https://issues.apache.org/jira/browse/KAFKA-697 > > >> >> >> > >> > > > > >> >> >> > >> > > On Fri, Jul 10, 2015 at 3:18 PM, Todd Palino < > > >> >> tpal...@gmail.com> > > >> >> >> > >> wrote: > > >> >> >> > >> > > > > >> >> >> > >> > >> This was definitely changed at some point after > > KAFKA-495. > > >> The > > >> >> >> > >> question > > >> >> >> > >> > is > > >> >> >> > >> > >> when and why. > > >> >> >> > >> > >> > > >> >> >> > >> > >> Here's the relevant code from that patch: > > >> >> >> > >> > >> > > >> >> >> > >> > >> > > >> >> >> > > =================================================================== > > >> >> >> > >> > >> --- core/src/main/scala/kafka/utils/Topic.scala > > (revision > > >> >> >> 1390178) > > >> >> >> > >> > >> +++ core/src/main/scala/kafka/utils/Topic.scala > (working > > >> copy) > > >> >> >> > >> > >> @@ -21,24 +21,21 @@ > > >> >> >> > >> > >> import util.matching.Regex > > >> >> >> > >> > >> > > >> >> >> > >> > >> object Topic { > > >> >> >> > >> > >> + val legalChars = "[a-zA-Z0-9_-]" > > >> >> >> > >> > >> > > >> >> >> > >> > >> > > >> >> >> > >> > >> > > >> >> >> > >> > >> -Todd > > >> >> >> > >> > >> > > >> >> >> > >> > >> > > >> >> >> > >> > >> On Fri, Jul 10, 2015 at 1:02 PM, Grant Henke < > > >> >> >> ghe...@cloudera.com> > > >> >> >> > >> > wrote: > > >> >> >> > >> > >> > > >> >> >> > >> > >> > kafka.common.Topic shows that currently period is a > > valid > > >> >> >> > character > > >> >> >> > >> > and I > > >> >> >> > >> > >> > have verified I can use kafka-topics.sh to create a > > new > > >> >> topic > > >> >> >> > with a > > >> >> >> > >> > >> > period. > > >> >> >> > >> > >> > > > >> >> >> > >> > >> > > > >> >> >> > >> > >> > > > AdminUtils.createOrUpdateTopicPartitionAssignmentPathInZK > > >> >> >> > currently > > >> >> >> > >> > uses > > >> >> >> > >> > >> > Topic.validate before writing to Zookeeper. > > >> >> >> > >> > >> > > > >> >> >> > >> > >> > Should period character support be removed? I was > > under > > >> the > > >> >> >> same > > >> >> >> > >> > >> impression > > >> >> >> > >> > >> > as Gwen, that a period was used by many as a way to > > >> "group" > > >> >> >> > topics. > > >> >> >> > >> > >> > > > >> >> >> > >> > >> > The code is pasted below since its small: > > >> >> >> > >> > >> > > > >> >> >> > >> > >> > object Topic { > > >> >> >> > >> > >> > val legalChars = "[a-zA-Z0-9\\._\\-]" > > >> >> >> > >> > >> > private val maxNameLength = 255 > > >> >> >> > >> > >> > private val rgx = new Regex(legalChars + "+") > > >> >> >> > >> > >> > > > >> >> >> > >> > >> > val InternalTopics = > > >> Set(OffsetManager.OffsetsTopicName) > > >> >> >> > >> > >> > > > >> >> >> > >> > >> > def validate(topic: String) { > > >> >> >> > >> > >> > if (topic.length <= 0) > > >> >> >> > >> > >> > throw new InvalidTopicException("topic name is > > >> >> illegal, > > >> >> >> > can't > > >> >> >> > >> be > > >> >> >> > >> > >> > empty") > > >> >> >> > >> > >> > else if (topic.equals(".") || > topic.equals("..")) > > >> >> >> > >> > >> > throw new InvalidTopicException("topic name > > cannot > > >> be > > >> >> >> > \".\" or > > >> >> >> > >> > >> > \"..\"") > > >> >> >> > >> > >> > else if (topic.length > maxNameLength) > > >> >> >> > >> > >> > throw new InvalidTopicException("topic name is > > >> >> illegal, > > >> >> >> > can't > > >> >> >> > >> be > > >> >> >> > >> > >> > longer than " + maxNameLength + " characters") > > >> >> >> > >> > >> > > > >> >> >> > >> > >> > rgx.findFirstIn(topic) match { > > >> >> >> > >> > >> > case Some(t) => > > >> >> >> > >> > >> > if (!t.equals(topic)) > > >> >> >> > >> > >> > throw new InvalidTopicException("topic > name > > " + > > >> >> topic > > >> >> >> > + " > > >> >> >> > >> is > > >> >> >> > >> > >> > illegal, contains a character other than ASCII > > >> >> alphanumerics, > > >> >> >> > '.', > > >> >> >> > >> '_' > > >> >> >> > >> > >> and > > >> >> >> > >> > >> > '-'") > > >> >> >> > >> > >> > case None => throw new > > InvalidTopicException("topic > > >> >> name > > >> >> >> " > > >> >> >> > + > > >> >> >> > >> > topic > > >> >> >> > >> > >> + > > >> >> >> > >> > >> > " is illegal, contains a character other than ASCII > > >> >> >> > alphanumerics, > > >> >> >> > >> > '.', > > >> >> >> > >> > >> > '_' and '-'") > > >> >> >> > >> > >> > } > > >> >> >> > >> > >> > } > > >> >> >> > >> > >> > } > > >> >> >> > >> > >> > > > >> >> >> > >> > >> > On Fri, Jul 10, 2015 at 2:50 PM, Todd Palino < > > >> >> >> tpal...@gmail.com> > > >> >> >> > >> > wrote: > > >> >> >> > >> > >> > > > >> >> >> > >> > >> > > I had to go look this one up again to make sure - > > >> >> >> > >> > >> > > https://issues.apache.org/jira/browse/KAFKA-495 > > >> >> >> > >> > >> > > > > >> >> >> > >> > >> > > The only valid character names for topics are > > >> >> alphanumeric, > > >> >> >> > >> > underscore, > > >> >> >> > >> > >> > and > > >> >> >> > >> > >> > > dash. A period is not supposed to be a valid > > character > > >> to > > >> >> >> use. > > >> >> >> > If > > >> >> >> > >> > >> you're > > >> >> >> > >> > >> > > seeing them, then one of two things have happened: > > >> >> >> > >> > >> > > > > >> >> >> > >> > >> > > 1) You have topic names that are grandfathered in > > from > > >> >> before > > >> >> >> > that > > >> >> >> > >> > >> patch > > >> >> >> > >> > >> > > 2) The patch is not working properly and there is > > >> >> somewhere > > >> >> >> in > > >> >> >> > the > > >> >> >> > >> > >> broker > > >> >> >> > >> > >> > > that the standard is not being enforced. > > >> >> >> > >> > >> > > > > >> >> >> > >> > >> > > -Todd > > >> >> >> > >> > >> > > > > >> >> >> > >> > >> > > > > >> >> >> > >> > >> > > On Fri, Jul 10, 2015 at 12:13 PM, Brock Noland < > > >> >> >> > br...@apache.org> > > >> >> >> > >> > >> wrote: > > >> >> >> > >> > >> > > > > >> >> >> > >> > >> > > > On Fri, Jul 10, 2015 at 11:34 AM, Gwen Shapira < > > >> >> >> > >> > >> gshap...@cloudera.com> > > >> >> >> > >> > >> > > > wrote: > > >> >> >> > >> > >> > > > > Hi Kafka Fans, > > >> >> >> > >> > >> > > > > > > >> >> >> > >> > >> > > > > If you have one topic named "kafka_lab_2" and > > the > > >> >> other > > >> >> >> > named > > >> >> >> > >> > >> > > > > "kafka.lab.2", the topic level metrics will be > > >> named > > >> >> >> > >> kafka_lab_2 > > >> >> >> > >> > >> for > > >> >> >> > >> > >> > > > > both, effectively making it impossible to > > monitor > > >> them > > >> >> >> > >> properly. > > >> >> >> > >> > >> > > > > > > >> >> >> > >> > >> > > > > The reason this happens is that using "." in > > topic > > >> >> names > > >> >> >> is > > >> >> >> > >> > pretty > > >> >> >> > >> > >> > > > > common, especially as a way to group topics > into > > >> data > > >> >> >> > centers, > > >> >> >> > >> > >> > > > > relevant apps, etc - basically a work-around > to > > our > > >> >> >> current > > >> >> >> > >> > lack of > > >> >> >> > >> > >> > > > > name spaces. However, most metric monitoring > > >> systems > > >> >> >> using > > >> >> >> > "." > > >> >> >> > >> > to > > >> >> >> > >> > >> > > > > annotate hierarchy, so to avoid issues around > > >> metric > > >> >> >> names, > > >> >> >> > >> > Kafka > > >> >> >> > >> > >> > > > > replaces the "." in the name with an > underscore. > > >> >> >> > >> > >> > > > > > > >> >> >> > >> > >> > > > > This generates good metric names, but creates > > the > > >> >> problem > > >> >> >> > with > > >> >> >> > >> > name > > >> >> >> > >> > >> > > > collisions. > > >> >> >> > >> > >> > > > > > > >> >> >> > >> > >> > > > > I'm wondering if it makes sense to simply > limit > > the > > >> >> range > > >> >> >> > of > > >> >> >> > >> > >> > > > > characters permitted in a topic name and > > disallow > > >> "_"? > > >> >> >> > >> Obviously > > >> >> >> > >> > >> > > > > existing topics will need to remain as is, > which > > >> is a > > >> >> bit > > >> >> >> > >> > awkward. > > >> >> >> > >> > >> > > > > > >> >> >> > >> > >> > > > Interesting problem! Many if not most users I > > >> >> personally am > > >> >> >> > >> aware > > >> >> >> > >> > of > > >> >> >> > >> > >> > > > use "_" as a separator in topic names. I am sure > > that > > >> >> many > > >> >> >> > users > > >> >> >> > >> > >> would > > >> >> >> > >> > >> > > > be quite surprised by this limitation. With that > > >> said, > > >> >> I am > > >> >> >> > sure > > >> >> >> > >> > >> > > > they'd transition accordingly. > > >> >> >> > >> > >> > > > > > >> >> >> > >> > >> > > > > > > >> >> >> > >> > >> > > > > If anyone has better backward-compatible > > solutions > > >> to > > >> >> >> this, > > >> >> >> > >> I'm > > >> >> >> > >> > all > > >> >> >> > >> > >> > > ears > > >> >> >> > >> > >> > > > :) > > >> >> >> > >> > >> > > > > > > >> >> >> > >> > >> > > > > Gwen > > >> >> >> > >> > >> > > > > > >> >> >> > >> > >> > > > > >> >> >> > >> > >> > > > >> >> >> > >> > >> > > > >> >> >> > >> > >> > > > >> >> >> > >> > >> > -- > > >> >> >> > >> > >> > Grant Henke > > >> >> >> > >> > >> > Solutions Consultant | Cloudera > > >> >> >> > >> > >> > ghe...@cloudera.com | twitter.com/gchenke | > > >> >> >> > >> > linkedin.com/in/granthenke > > >> >> >> > >> > >> > > > >> >> >> > >> > >> > > >> >> >> > >> > > > > >> >> >> > >> > > > > >> >> >> > >> > > > > >> >> >> > >> > > -- > > >> >> >> > >> > > Grant Henke > > >> >> >> > >> > > Solutions Consultant | Cloudera > > >> >> >> > >> > > ghe...@cloudera.com | twitter.com/gchenke | > > >> >> >> > linkedin.com/in/granthenke > > >> >> >> > >> > > > >> >> >> > >> > > >> >> >> > > > >> >> >> > > >> >> >> > > >> >> >> > > >> >> >> -- > > >> >> >> Thanks, > > >> >> >> Neha > > >> >> >> > > >> >> > > >> > > > >> > > > >> > > > >> > -- > > >> > Thanks, > > >> > Ewen > > >> > > > > > > > > > > > > -- > > > Thanks, > > > Ewen > > >