One way to get around this conflict could be to replace . with _ and _ with __
On Sat, Jul 11, 2015 at 10:33 AM, Todd Palino <tpal...@gmail.com> wrote: > I tend to agree with this as a compromise at this point. The reality is that > this is technical debt that has built up in the project, and it does not go > away by documenting it, and it will only get worse. > > As pointed out, eliminating either character at this point is going to cause > problems for someone. And unfortunately, Guozhang, converting to __ doesn't > really solve the problem either because that is still a valid topic name that > could collide. It's less likely, but all it does is move the debt around a > little. > > -Todd > >> On Jul 11, 2015, at 10:16 AM, Brock Noland <br...@apache.org> wrote: >> >> On Sat, Jul 11, 2015 at 12:54 AM, Ewen Cheslack-Postava >> <e...@confluent.io> wrote: >>> On Fri, Jul 10, 2015 at 4:41 PM, Gwen Shapira <gshap...@cloudera.com> wrote: >>> >>>> Yeah, I have an actual customer who ran into this. Unfortunately, >>>> inconsistencies in the way things are named are pretty common - just >>>> look at Kafka's many CLI options. >>>> >>>> I don't think that supporting both and pointing at the docs with "I >>>> told you so" when our metrics break is a good solution. >>> >>> I agree, especially since we don't *already* have something in the docs >>> indicating this will be an issue. I was flippant about the situation >>> because I *wish* there was more careful consideration + naming policy in >>> place, but I realize that doesn't always happen in practice. I guess I need >>> to take Compatibility Czar more seriously :) >>> >>> I see think the obvious practical options are as follows: >>> >>> 1. Kill support for "_". Piss off the entire set of people who currently >>> use "_" anywhere in topic names. >>> 2. Kill support for ".". Piss off the entire set of people who currently >>> use "." anywhere in topic names. >>> 3. Tell people they need to be careful about this issue. Piss off the set >>> of people who use both "_" and "." *and* happen to have conflicting topic >>> names. They will have some pain when they discover the issue and have to >>> figure out how to move one of those topics over to a non-conflicting name. >>> I'm going to claim that this group must be an *extremely* small fraction of >>> users, which doesn't make it better to allow things to break for them, but >>> at least gives us an idea of the scale of impact. >>> >>> (One other alternative suggested earlier was encoding metric names to >>> account for differences; given the metric renaming mess in the last >>> release, I'm extremely hesitant to suggest anything of the sort...) >>> >>> None of the options are ideal, but to me, 3 seems like the least painful. >>> Both for us, and for the vast majority of users. It seems to me that the >>> number of users that would complain about (1) or (2) drastically outweigh >>> (3). >>> >>> At this point, I don't think it's practical to keep switching the rules >>> about which characters are allowed and which aren't because the previous >>> attempts haven't been successful -- it seems the rules have changed >>> multiple times, whether intentionally or accidentally, such that any more >>> changes will cause problems. At this point, I think we just need to accept >>> being liberal in accepting the range of topic names that have been >>> permitted so far and make the best of the situation, even if it means only >>> being able to warn people of conflicts. >>> >>> Here's another alternative: how about being liberal with topic name >>> characters, but upon topic creation we convert the name to the metric name >>> and fail if there's a conflict with another topic? This is relatively >>> expensive (requires getting the metric name of all other topics), but it >>> avoids the bad situation we're encountering here (conflicting metrics), >>> avoids getting into a persistent conflict (we kill topic creation when we >>> detect the issue rather than noticing it when the metrics conflict >>> happens), and keeps the vast majority of existing users happy (both _ and . >>> work in topic names as long as you don't create topics with conflicting >>> metric names). >>> >>> There are definitely details to be worked out (auto topic creation?), but >>> it seems like a more realistic solution than to start disallowing _ or . in >>> topic names. >> >> I was thinking the same. Allow a.b or a_b but not a.b and a_b. This >> seems like it will impact a trivial amount of users and keep both the >> "." and "_" camps happy. >> >>> >>> -Ewen >>> >>> >>>> >>>> On Fri, Jul 10, 2015 at 4:33 PM, Ewen Cheslack-Postava >>>> <e...@confluent.io> wrote: >>>>> I figure you'll probably see complaints no matter what change you make. >>>>> Gwen, given that you raised this, another important question might be how >>>>> many people you see using *both*. I'm guessing this question came up >>>>> because you actually saw a conflict? But I'd imagine (or at least hope) >>>>> that most organizations are mostly consistent about naming topics -- they >>>>> standardize on one or the other. >>>>> >>>>> Since there's no "right" way to name them, I'd just leave it supporting >>>>> both and document the potential conflict in metrics. And if people use >>>> both >>>>> naming schemes, they probably deserve to suffer for their inconsistency >>>> :) >>>>> >>>>> -Ewen >>>>> >>>>>> On Fri, Jul 10, 2015 at 3:28 PM, Gwen Shapira <gshap...@cloudera.com> >>>>> wrote: >>>>> >>>>>> I find dots more common in my customer base, so I will definitely feel >>>>>> the pain of removing them. >>>>>> >>>>>> However, "." are already used in metrics, file names, directories, etc >>>>>> - so if we keep the dots, we need to keep code that translates them >>>>>> and document the translation. Just banning "." seems more natural. >>>>>> Also, as Grant mentioned, we'll probably have our own special usage >>>>>> for "." down the line. >>>>>> >>>>>>> On Fri, Jul 10, 2015 at 2:12 PM, Todd Palino <tpal...@gmail.com> wrote: >>>>>>> I absolutely disagree with #2, Neha. That will break a lot of >>>>>>> infrastructure within LinkedIn. That said, removing "." might break >>>> other >>>>>>> people as well, but I think we should have a clearer idea of how much >>>>>> usage >>>>>>> there is on either side. >>>>>>> >>>>>>> -Todd >>>>>>> >>>>>>> >>>>>>>> On Fri, Jul 10, 2015 at 2:08 PM, Neha Narkhede <n...@confluent.io> >>>>>>> wrote: >>>>>>> >>>>>>>> "." seems natural for grouping topic names. +1 for 2) going forward >>>> only >>>>>>>> without breaking previously created topics with "_" though that might >>>>>>>> require us to patch the code somewhat awkwardly till we phase it out >>>> a >>>>>>>> couple (purposely left vague to stay out of Ewen's wrath :-)) >>>> versions >>>>>>>> later. >>>>>>>> >>>>>>>> On Fri, Jul 10, 2015 at 2:02 PM, Gwen Shapira <gshap...@cloudera.com >>>>> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> I don't think we should break existing topics. Just disallow new >>>>>>>>> topics going forward. >>>>>>>>> >>>>>>>>> Agree that having both is horrible, but we should have a solution >>>> that >>>>>>>>> fails when you run "kafka_topics.sh --create", not when you >>>> configure >>>>>>>>> Ganglia. >>>>>>>>> >>>>>>>>> Gwen >>>>>>>>> >>>>>>>>> On Fri, Jul 10, 2015 at 1:53 PM, Jay Kreps <j...@confluent.io> >>>> wrote: >>>>>>>>>> Unfortunately '.' is pretty common too. I agree that it is >>>> perverse, >>>>>>>> but >>>>>>>>>> people seem to do it. Breaking all the topics with '.' in the >>>> name >>>>>>>> seems >>>>>>>>>> like it could be worse than combining metrics for people who >>>> have a >>>>>>>>>> 'foo_bar' AND 'foo.bar' (and after all, having both is DEEPLY >>>>>> perverse, >>>>>>>>>> no?). >>>>>>>>>> >>>>>>>>>> Where is our Dean of Compatibility, Ewen, on this? >>>>>>>>>> >>>>>>>>>> -Jay >>>>>>>>>> >>>>>>>>>> On Fri, Jul 10, 2015 at 1:32 PM, Todd Palino <tpal...@gmail.com> >>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> My selfish point of view is that we do #1, as we use "_" >>>>>> extensively >>>>>>>> in >>>>>>>>>>> topic names here :) I also happen to think it's the right >>>> choice, >>>>>>>>>>> specifically because "." has more special meanings, as you >>>> noted. >>>>>>>>>>> >>>>>>>>>>> -Todd >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Fri, Jul 10, 2015 at 1:30 PM, Gwen Shapira < >>>>>> gshap...@cloudera.com> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> Unintentional side effect from allowing IP addresses in >>>> consumer >>>>>>>>> client >>>>>>>>>>>> IDs :) >>>>>>>>>>>> >>>>>>>>>>>> So the question is, what do we do now? >>>>>>>>>>>> >>>>>>>>>>>> 1) disallow "." >>>>>>>>>>>> 2) disallow "_" >>>>>>>>>>>> 3) find a reversible way to encode "." and "_" that won't >>>> break >>>>>>>>> existing >>>>>>>>>>>> metrics >>>>>>>>>>>> 4) all of the above? >>>>>>>>>>>> >>>>>>>>>>>> btw. it looks like "." and ".." are currently valid. Topic >>>> names >>>>>> are >>>>>>>>>>>> used for directories, right? this sounds like fun :) >>>>>>>>>>>> >>>>>>>>>>>> I vote for option #1, although if someone has a good idea for >>>> #3 >>>>>> it >>>>>>>>>>>> will be even better. >>>>>>>>>>>> >>>>>>>>>>>> Gwen >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Fri, Jul 10, 2015 at 1:22 PM, Grant Henke < >>>>>> ghe...@cloudera.com> >>>>>>>>>>> wrote: >>>>>>>>>>>>> Found it was added here: >>>>>>>>>>> https://issues.apache.org/jira/browse/KAFKA-697 >>>>>>>>>>>>> >>>>>>>>>>>>> On Fri, Jul 10, 2015 at 3:18 PM, Todd Palino < >>>>>> tpal...@gmail.com> >>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> This was definitely changed at some point after KAFKA-495. >>>> The >>>>>>>>>>> question >>>>>>>>>>>> is >>>>>>>>>>>>>> when and why. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Here's the relevant code from that patch: >>>>>>>> =================================================================== >>>>>>>>>>>>>> --- core/src/main/scala/kafka/utils/Topic.scala (revision >>>>>>>> 1390178) >>>>>>>>>>>>>> +++ core/src/main/scala/kafka/utils/Topic.scala (working >>>> copy) >>>>>>>>>>>>>> @@ -21,24 +21,21 @@ >>>>>>>>>>>>>> import util.matching.Regex >>>>>>>>>>>>>> >>>>>>>>>>>>>> object Topic { >>>>>>>>>>>>>> + val legalChars = "[a-zA-Z0-9_-]" >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> -Todd >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Fri, Jul 10, 2015 at 1:02 PM, Grant Henke < >>>>>>>> ghe...@cloudera.com> >>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> kafka.common.Topic shows that currently period is a valid >>>>>>>>> character >>>>>>>>>>>> and I >>>>>>>>>>>>>>> have verified I can use kafka-topics.sh to create a new >>>>>> topic >>>>>>>>> with a >>>>>>>>>>>>>>> period. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> AdminUtils.createOrUpdateTopicPartitionAssignmentPathInZK >>>>>>>>> currently >>>>>>>>>>>> uses >>>>>>>>>>>>>>> Topic.validate before writing to Zookeeper. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Should period character support be removed? I was under >>>> the >>>>>>>> same >>>>>>>>>>>>>> impression >>>>>>>>>>>>>>> as Gwen, that a period was used by many as a way to >>>> "group" >>>>>>>>> topics. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> The code is pasted below since its small: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> object Topic { >>>>>>>>>>>>>>> val legalChars = "[a-zA-Z0-9\\._\\-]" >>>>>>>>>>>>>>> private val maxNameLength = 255 >>>>>>>>>>>>>>> private val rgx = new Regex(legalChars + "+") >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> val InternalTopics = >>>> Set(OffsetManager.OffsetsTopicName) >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> def validate(topic: String) { >>>>>>>>>>>>>>> if (topic.length <= 0) >>>>>>>>>>>>>>> throw new InvalidTopicException("topic name is >>>>>> illegal, >>>>>>>>> can't >>>>>>>>>>> be >>>>>>>>>>>>>>> empty") >>>>>>>>>>>>>>> else if (topic.equals(".") || topic.equals("..")) >>>>>>>>>>>>>>> throw new InvalidTopicException("topic name cannot >>>> be >>>>>>>>> \".\" or >>>>>>>>>>>>>>> \"..\"") >>>>>>>>>>>>>>> else if (topic.length > maxNameLength) >>>>>>>>>>>>>>> throw new InvalidTopicException("topic name is >>>>>> illegal, >>>>>>>>> can't >>>>>>>>>>> be >>>>>>>>>>>>>>> longer than " + maxNameLength + " characters") >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> rgx.findFirstIn(topic) match { >>>>>>>>>>>>>>> case Some(t) => >>>>>>>>>>>>>>> if (!t.equals(topic)) >>>>>>>>>>>>>>> throw new InvalidTopicException("topic name " + >>>>>> topic >>>>>>>>> + " >>>>>>>>>>> is >>>>>>>>>>>>>>> illegal, contains a character other than ASCII >>>>>> alphanumerics, >>>>>>>>> '.', >>>>>>>>>>> '_' >>>>>>>>>>>>>> and >>>>>>>>>>>>>>> '-'") >>>>>>>>>>>>>>> case None => throw new InvalidTopicException("topic >>>>>> name >>>>>>>> " >>>>>>>>> + >>>>>>>>>>>> topic >>>>>>>>>>>>>> + >>>>>>>>>>>>>>> " is illegal, contains a character other than ASCII >>>>>>>>> alphanumerics, >>>>>>>>>>>> '.', >>>>>>>>>>>>>>> '_' and '-'") >>>>>>>>>>>>>>> } >>>>>>>>>>>>>>> } >>>>>>>>>>>>>>> } >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Fri, Jul 10, 2015 at 2:50 PM, Todd Palino < >>>>>>>> tpal...@gmail.com> >>>>>>>>>>>> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I had to go look this one up again to make sure - >>>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/KAFKA-495 >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> The only valid character names for topics are >>>>>> alphanumeric, >>>>>>>>>>>> underscore, >>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>> dash. A period is not supposed to be a valid character >>>> to >>>>>>>> use. >>>>>>>>> If >>>>>>>>>>>>>> you're >>>>>>>>>>>>>>>> seeing them, then one of two things have happened: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> 1) You have topic names that are grandfathered in from >>>>>> before >>>>>>>>> that >>>>>>>>>>>>>> patch >>>>>>>>>>>>>>>> 2) The patch is not working properly and there is >>>>>> somewhere >>>>>>>> in >>>>>>>>> the >>>>>>>>>>>>>> broker >>>>>>>>>>>>>>>> that the standard is not being enforced. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> -Todd >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Fri, Jul 10, 2015 at 12:13 PM, Brock Noland < >>>>>>>>> br...@apache.org> >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Fri, Jul 10, 2015 at 11:34 AM, Gwen Shapira < >>>>>>>>>>>>>> gshap...@cloudera.com> >>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>> Hi Kafka Fans, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> If you have one topic named "kafka_lab_2" and the >>>>>> other >>>>>>>>> named >>>>>>>>>>>>>>>>>> "kafka.lab.2", the topic level metrics will be >>>> named >>>>>>>>>>> kafka_lab_2 >>>>>>>>>>>>>> for >>>>>>>>>>>>>>>>>> both, effectively making it impossible to monitor >>>> them >>>>>>>>>>> properly. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> The reason this happens is that using "." in topic >>>>>> names >>>>>>>> is >>>>>>>>>>>> pretty >>>>>>>>>>>>>>>>>> common, especially as a way to group topics into >>>> data >>>>>>>>> centers, >>>>>>>>>>>>>>>>>> relevant apps, etc - basically a work-around to our >>>>>>>> current >>>>>>>>>>>> lack of >>>>>>>>>>>>>>>>>> name spaces. However, most metric monitoring >>>> systems >>>>>>>> using >>>>>>>>> "." >>>>>>>>>>>> to >>>>>>>>>>>>>>>>>> annotate hierarchy, so to avoid issues around >>>> metric >>>>>>>> names, >>>>>>>>>>>> Kafka >>>>>>>>>>>>>>>>>> replaces the "." in the name with an underscore. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> This generates good metric names, but creates the >>>>>> problem >>>>>>>>> with >>>>>>>>>>>> name >>>>>>>>>>>>>>>>> collisions. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> I'm wondering if it makes sense to simply limit the >>>>>> range >>>>>>>>> of >>>>>>>>>>>>>>>>>> characters permitted in a topic name and disallow >>>> "_"? >>>>>>>>>>> Obviously >>>>>>>>>>>>>>>>>> existing topics will need to remain as is, which >>>> is a >>>>>> bit >>>>>>>>>>>> awkward. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Interesting problem! Many if not most users I >>>>>> personally am >>>>>>>>>>> aware >>>>>>>>>>>> of >>>>>>>>>>>>>>>>> use "_" as a separator in topic names. I am sure that >>>>>> many >>>>>>>>> users >>>>>>>>>>>>>> would >>>>>>>>>>>>>>>>> be quite surprised by this limitation. With that >>>> said, >>>>>> I am >>>>>>>>> sure >>>>>>>>>>>>>>>>> they'd transition accordingly. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> If anyone has better backward-compatible solutions >>>> to >>>>>>>> this, >>>>>>>>>>> I'm >>>>>>>>>>>> all >>>>>>>>>>>>>>>> ears >>>>>>>>>>>>>>>>> :) >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Gwen >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>> Grant Henke >>>>>>>>>>>>>>> Solutions Consultant | Cloudera >>>>>>>>>>>>>>> ghe...@cloudera.com | twitter.com/gchenke | >>>>>>>>>>>> linkedin.com/in/granthenke >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> Grant Henke >>>>>>>>>>>>> Solutions Consultant | Cloudera >>>>>>>>>>>>> ghe...@cloudera.com | twitter.com/gchenke | >>>>>>>>> linkedin.com/in/granthenke >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Thanks, >>>>>>>> Neha >>>>> >>>>> >>>>> >>>>> -- >>>>> Thanks, >>>>> Ewen >>> >>> >>> >>> -- >>> Thanks, >>> Ewen