On Sat, Jul 11, 2015 at 12:54 AM, Ewen Cheslack-Postava <e...@confluent.io> wrote: > On Fri, Jul 10, 2015 at 4:41 PM, Gwen Shapira <gshap...@cloudera.com> wrote: > >> Yeah, I have an actual customer who ran into this. Unfortunately, >> inconsistencies in the way things are named are pretty common - just >> look at Kafka's many CLI options. >> >> I don't think that supporting both and pointing at the docs with "I >> told you so" when our metrics break is a good solution. >> > > I agree, especially since we don't *already* have something in the docs > indicating this will be an issue. I was flippant about the situation > because I *wish* there was more careful consideration + naming policy in > place, but I realize that doesn't always happen in practice. I guess I need > to take Compatibility Czar more seriously :) > > I see think the obvious practical options are as follows: > > 1. Kill support for "_". Piss off the entire set of people who currently > use "_" anywhere in topic names. > 2. Kill support for ".". Piss off the entire set of people who currently > use "." anywhere in topic names. > 3. Tell people they need to be careful about this issue. Piss off the set > of people who use both "_" and "." *and* happen to have conflicting topic > names. They will have some pain when they discover the issue and have to > figure out how to move one of those topics over to a non-conflicting name. > I'm going to claim that this group must be an *extremely* small fraction of > users, which doesn't make it better to allow things to break for them, but > at least gives us an idea of the scale of impact. > > (One other alternative suggested earlier was encoding metric names to > account for differences; given the metric renaming mess in the last > release, I'm extremely hesitant to suggest anything of the sort...) > > None of the options are ideal, but to me, 3 seems like the least painful. > Both for us, and for the vast majority of users. It seems to me that the > number of users that would complain about (1) or (2) drastically outweigh > (3). > > At this point, I don't think it's practical to keep switching the rules > about which characters are allowed and which aren't because the previous > attempts haven't been successful -- it seems the rules have changed > multiple times, whether intentionally or accidentally, such that any more > changes will cause problems. At this point, I think we just need to accept > being liberal in accepting the range of topic names that have been > permitted so far and make the best of the situation, even if it means only > being able to warn people of conflicts. > > Here's another alternative: how about being liberal with topic name > characters, but upon topic creation we convert the name to the metric name > and fail if there's a conflict with another topic? This is relatively > expensive (requires getting the metric name of all other topics), but it > avoids the bad situation we're encountering here (conflicting metrics), > avoids getting into a persistent conflict (we kill topic creation when we > detect the issue rather than noticing it when the metrics conflict > happens), and keeps the vast majority of existing users happy (both _ and . > work in topic names as long as you don't create topics with conflicting > metric names). > > There are definitely details to be worked out (auto topic creation?), but > it seems like a more realistic solution than to start disallowing _ or . in > topic names.
I was thinking the same. Allow a.b or a_b but not a.b and a_b. This seems like it will impact a trivial amount of users and keep both the "." and "_" camps happy. > > -Ewen > > >> >> On Fri, Jul 10, 2015 at 4:33 PM, Ewen Cheslack-Postava >> <e...@confluent.io> wrote: >> > I figure you'll probably see complaints no matter what change you make. >> > Gwen, given that you raised this, another important question might be how >> > many people you see using *both*. I'm guessing this question came up >> > because you actually saw a conflict? But I'd imagine (or at least hope) >> > that most organizations are mostly consistent about naming topics -- they >> > standardize on one or the other. >> > >> > Since there's no "right" way to name them, I'd just leave it supporting >> > both and document the potential conflict in metrics. And if people use >> both >> > naming schemes, they probably deserve to suffer for their inconsistency >> :) >> > >> > -Ewen >> > >> > On Fri, Jul 10, 2015 at 3:28 PM, Gwen Shapira <gshap...@cloudera.com> >> wrote: >> > >> >> I find dots more common in my customer base, so I will definitely feel >> >> the pain of removing them. >> >> >> >> However, "." are already used in metrics, file names, directories, etc >> >> - so if we keep the dots, we need to keep code that translates them >> >> and document the translation. Just banning "." seems more natural. >> >> Also, as Grant mentioned, we'll probably have our own special usage >> >> for "." down the line. >> >> >> >> On Fri, Jul 10, 2015 at 2:12 PM, Todd Palino <tpal...@gmail.com> wrote: >> >> > I absolutely disagree with #2, Neha. That will break a lot of >> >> > infrastructure within LinkedIn. That said, removing "." might break >> other >> >> > people as well, but I think we should have a clearer idea of how much >> >> usage >> >> > there is on either side. >> >> > >> >> > -Todd >> >> > >> >> > >> >> > On Fri, Jul 10, 2015 at 2:08 PM, Neha Narkhede <n...@confluent.io> >> >> wrote: >> >> > >> >> >> "." seems natural for grouping topic names. +1 for 2) going forward >> only >> >> >> without breaking previously created topics with "_" though that might >> >> >> require us to patch the code somewhat awkwardly till we phase it out >> a >> >> >> couple (purposely left vague to stay out of Ewen's wrath :-)) >> versions >> >> >> later. >> >> >> >> >> >> On Fri, Jul 10, 2015 at 2:02 PM, Gwen Shapira <gshap...@cloudera.com >> > >> >> >> wrote: >> >> >> >> >> >> > I don't think we should break existing topics. Just disallow new >> >> >> > topics going forward. >> >> >> > >> >> >> > Agree that having both is horrible, but we should have a solution >> that >> >> >> > fails when you run "kafka_topics.sh --create", not when you >> configure >> >> >> > Ganglia. >> >> >> > >> >> >> > Gwen >> >> >> > >> >> >> > On Fri, Jul 10, 2015 at 1:53 PM, Jay Kreps <j...@confluent.io> >> wrote: >> >> >> > > Unfortunately '.' is pretty common too. I agree that it is >> perverse, >> >> >> but >> >> >> > > people seem to do it. Breaking all the topics with '.' in the >> name >> >> >> seems >> >> >> > > like it could be worse than combining metrics for people who >> have a >> >> >> > > 'foo_bar' AND 'foo.bar' (and after all, having both is DEEPLY >> >> perverse, >> >> >> > > no?). >> >> >> > > >> >> >> > > Where is our Dean of Compatibility, Ewen, on this? >> >> >> > > >> >> >> > > -Jay >> >> >> > > >> >> >> > > On Fri, Jul 10, 2015 at 1:32 PM, Todd Palino <tpal...@gmail.com> >> >> >> wrote: >> >> >> > > >> >> >> > >> My selfish point of view is that we do #1, as we use "_" >> >> extensively >> >> >> in >> >> >> > >> topic names here :) I also happen to think it's the right >> choice, >> >> >> > >> specifically because "." has more special meanings, as you >> noted. >> >> >> > >> >> >> >> > >> -Todd >> >> >> > >> >> >> >> > >> >> >> >> > >> On Fri, Jul 10, 2015 at 1:30 PM, Gwen Shapira < >> >> gshap...@cloudera.com> >> >> >> > >> wrote: >> >> >> > >> >> >> >> > >> > Unintentional side effect from allowing IP addresses in >> consumer >> >> >> > client >> >> >> > >> > IDs :) >> >> >> > >> > >> >> >> > >> > So the question is, what do we do now? >> >> >> > >> > >> >> >> > >> > 1) disallow "." >> >> >> > >> > 2) disallow "_" >> >> >> > >> > 3) find a reversible way to encode "." and "_" that won't >> break >> >> >> > existing >> >> >> > >> > metrics >> >> >> > >> > 4) all of the above? >> >> >> > >> > >> >> >> > >> > btw. it looks like "." and ".." are currently valid. Topic >> names >> >> are >> >> >> > >> > used for directories, right? this sounds like fun :) >> >> >> > >> > >> >> >> > >> > I vote for option #1, although if someone has a good idea for >> #3 >> >> it >> >> >> > >> > will be even better. >> >> >> > >> > >> >> >> > >> > Gwen >> >> >> > >> > >> >> >> > >> > >> >> >> > >> > >> >> >> > >> > On Fri, Jul 10, 2015 at 1:22 PM, Grant Henke < >> >> ghe...@cloudera.com> >> >> >> > >> wrote: >> >> >> > >> > > Found it was added here: >> >> >> > >> https://issues.apache.org/jira/browse/KAFKA-697 >> >> >> > >> > > >> >> >> > >> > > On Fri, Jul 10, 2015 at 3:18 PM, Todd Palino < >> >> tpal...@gmail.com> >> >> >> > >> wrote: >> >> >> > >> > > >> >> >> > >> > >> This was definitely changed at some point after KAFKA-495. >> The >> >> >> > >> question >> >> >> > >> > is >> >> >> > >> > >> when and why. >> >> >> > >> > >> >> >> >> > >> > >> Here's the relevant code from that patch: >> >> >> > >> > >> >> >> >> > >> > >> >> >> >> =================================================================== >> >> >> > >> > >> --- core/src/main/scala/kafka/utils/Topic.scala (revision >> >> >> 1390178) >> >> >> > >> > >> +++ core/src/main/scala/kafka/utils/Topic.scala (working >> copy) >> >> >> > >> > >> @@ -21,24 +21,21 @@ >> >> >> > >> > >> import util.matching.Regex >> >> >> > >> > >> >> >> >> > >> > >> object Topic { >> >> >> > >> > >> + val legalChars = "[a-zA-Z0-9_-]" >> >> >> > >> > >> >> >> >> > >> > >> >> >> >> > >> > >> >> >> >> > >> > >> -Todd >> >> >> > >> > >> >> >> >> > >> > >> >> >> >> > >> > >> On Fri, Jul 10, 2015 at 1:02 PM, Grant Henke < >> >> >> ghe...@cloudera.com> >> >> >> > >> > wrote: >> >> >> > >> > >> >> >> >> > >> > >> > kafka.common.Topic shows that currently period is a valid >> >> >> > character >> >> >> > >> > and I >> >> >> > >> > >> > have verified I can use kafka-topics.sh to create a new >> >> topic >> >> >> > with a >> >> >> > >> > >> > period. >> >> >> > >> > >> > >> >> >> > >> > >> > >> >> >> > >> > >> > AdminUtils.createOrUpdateTopicPartitionAssignmentPathInZK >> >> >> > currently >> >> >> > >> > uses >> >> >> > >> > >> > Topic.validate before writing to Zookeeper. >> >> >> > >> > >> > >> >> >> > >> > >> > Should period character support be removed? I was under >> the >> >> >> same >> >> >> > >> > >> impression >> >> >> > >> > >> > as Gwen, that a period was used by many as a way to >> "group" >> >> >> > topics. >> >> >> > >> > >> > >> >> >> > >> > >> > The code is pasted below since its small: >> >> >> > >> > >> > >> >> >> > >> > >> > object Topic { >> >> >> > >> > >> > val legalChars = "[a-zA-Z0-9\\._\\-]" >> >> >> > >> > >> > private val maxNameLength = 255 >> >> >> > >> > >> > private val rgx = new Regex(legalChars + "+") >> >> >> > >> > >> > >> >> >> > >> > >> > val InternalTopics = >> Set(OffsetManager.OffsetsTopicName) >> >> >> > >> > >> > >> >> >> > >> > >> > def validate(topic: String) { >> >> >> > >> > >> > if (topic.length <= 0) >> >> >> > >> > >> > throw new InvalidTopicException("topic name is >> >> illegal, >> >> >> > can't >> >> >> > >> be >> >> >> > >> > >> > empty") >> >> >> > >> > >> > else if (topic.equals(".") || topic.equals("..")) >> >> >> > >> > >> > throw new InvalidTopicException("topic name cannot >> be >> >> >> > \".\" or >> >> >> > >> > >> > \"..\"") >> >> >> > >> > >> > else if (topic.length > maxNameLength) >> >> >> > >> > >> > throw new InvalidTopicException("topic name is >> >> illegal, >> >> >> > can't >> >> >> > >> be >> >> >> > >> > >> > longer than " + maxNameLength + " characters") >> >> >> > >> > >> > >> >> >> > >> > >> > rgx.findFirstIn(topic) match { >> >> >> > >> > >> > case Some(t) => >> >> >> > >> > >> > if (!t.equals(topic)) >> >> >> > >> > >> > throw new InvalidTopicException("topic name " + >> >> topic >> >> >> > + " >> >> >> > >> is >> >> >> > >> > >> > illegal, contains a character other than ASCII >> >> alphanumerics, >> >> >> > '.', >> >> >> > >> '_' >> >> >> > >> > >> and >> >> >> > >> > >> > '-'") >> >> >> > >> > >> > case None => throw new InvalidTopicException("topic >> >> name >> >> >> " >> >> >> > + >> >> >> > >> > topic >> >> >> > >> > >> + >> >> >> > >> > >> > " is illegal, contains a character other than ASCII >> >> >> > alphanumerics, >> >> >> > >> > '.', >> >> >> > >> > >> > '_' and '-'") >> >> >> > >> > >> > } >> >> >> > >> > >> > } >> >> >> > >> > >> > } >> >> >> > >> > >> > >> >> >> > >> > >> > On Fri, Jul 10, 2015 at 2:50 PM, Todd Palino < >> >> >> tpal...@gmail.com> >> >> >> > >> > wrote: >> >> >> > >> > >> > >> >> >> > >> > >> > > I had to go look this one up again to make sure - >> >> >> > >> > >> > > https://issues.apache.org/jira/browse/KAFKA-495 >> >> >> > >> > >> > > >> >> >> > >> > >> > > The only valid character names for topics are >> >> alphanumeric, >> >> >> > >> > underscore, >> >> >> > >> > >> > and >> >> >> > >> > >> > > dash. A period is not supposed to be a valid character >> to >> >> >> use. >> >> >> > If >> >> >> > >> > >> you're >> >> >> > >> > >> > > seeing them, then one of two things have happened: >> >> >> > >> > >> > > >> >> >> > >> > >> > > 1) You have topic names that are grandfathered in from >> >> before >> >> >> > that >> >> >> > >> > >> patch >> >> >> > >> > >> > > 2) The patch is not working properly and there is >> >> somewhere >> >> >> in >> >> >> > the >> >> >> > >> > >> broker >> >> >> > >> > >> > > that the standard is not being enforced. >> >> >> > >> > >> > > >> >> >> > >> > >> > > -Todd >> >> >> > >> > >> > > >> >> >> > >> > >> > > >> >> >> > >> > >> > > On Fri, Jul 10, 2015 at 12:13 PM, Brock Noland < >> >> >> > br...@apache.org> >> >> >> > >> > >> wrote: >> >> >> > >> > >> > > >> >> >> > >> > >> > > > On Fri, Jul 10, 2015 at 11:34 AM, Gwen Shapira < >> >> >> > >> > >> gshap...@cloudera.com> >> >> >> > >> > >> > > > wrote: >> >> >> > >> > >> > > > > Hi Kafka Fans, >> >> >> > >> > >> > > > > >> >> >> > >> > >> > > > > If you have one topic named "kafka_lab_2" and the >> >> other >> >> >> > named >> >> >> > >> > >> > > > > "kafka.lab.2", the topic level metrics will be >> named >> >> >> > >> kafka_lab_2 >> >> >> > >> > >> for >> >> >> > >> > >> > > > > both, effectively making it impossible to monitor >> them >> >> >> > >> properly. >> >> >> > >> > >> > > > > >> >> >> > >> > >> > > > > The reason this happens is that using "." in topic >> >> names >> >> >> is >> >> >> > >> > pretty >> >> >> > >> > >> > > > > common, especially as a way to group topics into >> data >> >> >> > centers, >> >> >> > >> > >> > > > > relevant apps, etc - basically a work-around to our >> >> >> current >> >> >> > >> > lack of >> >> >> > >> > >> > > > > name spaces. However, most metric monitoring >> systems >> >> >> using >> >> >> > "." >> >> >> > >> > to >> >> >> > >> > >> > > > > annotate hierarchy, so to avoid issues around >> metric >> >> >> names, >> >> >> > >> > Kafka >> >> >> > >> > >> > > > > replaces the "." in the name with an underscore. >> >> >> > >> > >> > > > > >> >> >> > >> > >> > > > > This generates good metric names, but creates the >> >> problem >> >> >> > with >> >> >> > >> > name >> >> >> > >> > >> > > > collisions. >> >> >> > >> > >> > > > > >> >> >> > >> > >> > > > > I'm wondering if it makes sense to simply limit the >> >> range >> >> >> > of >> >> >> > >> > >> > > > > characters permitted in a topic name and disallow >> "_"? >> >> >> > >> Obviously >> >> >> > >> > >> > > > > existing topics will need to remain as is, which >> is a >> >> bit >> >> >> > >> > awkward. >> >> >> > >> > >> > > > >> >> >> > >> > >> > > > Interesting problem! Many if not most users I >> >> personally am >> >> >> > >> aware >> >> >> > >> > of >> >> >> > >> > >> > > > use "_" as a separator in topic names. I am sure that >> >> many >> >> >> > users >> >> >> > >> > >> would >> >> >> > >> > >> > > > be quite surprised by this limitation. With that >> said, >> >> I am >> >> >> > sure >> >> >> > >> > >> > > > they'd transition accordingly. >> >> >> > >> > >> > > > >> >> >> > >> > >> > > > > >> >> >> > >> > >> > > > > If anyone has better backward-compatible solutions >> to >> >> >> this, >> >> >> > >> I'm >> >> >> > >> > all >> >> >> > >> > >> > > ears >> >> >> > >> > >> > > > :) >> >> >> > >> > >> > > > > >> >> >> > >> > >> > > > > Gwen >> >> >> > >> > >> > > > >> >> >> > >> > >> > > >> >> >> > >> > >> > >> >> >> > >> > >> > >> >> >> > >> > >> > >> >> >> > >> > >> > -- >> >> >> > >> > >> > Grant Henke >> >> >> > >> > >> > Solutions Consultant | Cloudera >> >> >> > >> > >> > ghe...@cloudera.com | twitter.com/gchenke | >> >> >> > >> > linkedin.com/in/granthenke >> >> >> > >> > >> > >> >> >> > >> > >> >> >> >> > >> > > >> >> >> > >> > > >> >> >> > >> > > >> >> >> > >> > > -- >> >> >> > >> > > Grant Henke >> >> >> > >> > > Solutions Consultant | Cloudera >> >> >> > >> > > ghe...@cloudera.com | twitter.com/gchenke | >> >> >> > linkedin.com/in/granthenke >> >> >> > >> > >> >> >> > >> >> >> >> > >> >> >> >> >> >> >> >> >> >> >> >> -- >> >> >> Thanks, >> >> >> Neha >> >> >> >> >> >> > >> > >> > >> > -- >> > Thanks, >> > Ewen >> > > > > -- > Thanks, > Ewen