Can we provide a tool so folks can "sync back" old topic names to new so their clusters aren't format lopsided.
~ Joestein On Jul 11, 2015 1:33 PM, "Todd Palino" <tpal...@gmail.com> wrote: > I tend to agree with this as a compromise at this point. The reality is > that this is technical debt that has built up in the project, and it does > not go away by documenting it, and it will only get worse. > > As pointed out, eliminating either character at this point is going to > cause problems for someone. And unfortunately, Guozhang, converting to __ > doesn't really solve the problem either because that is still a valid topic > name that could collide. It's less likely, but all it does is move the debt > around a little. > > -Todd > > > On Jul 11, 2015, at 10:16 AM, Brock Noland <br...@apache.org> wrote: > > > > On Sat, Jul 11, 2015 at 12:54 AM, Ewen Cheslack-Postava > > <e...@confluent.io> wrote: > >> On Fri, Jul 10, 2015 at 4:41 PM, Gwen Shapira <gshap...@cloudera.com> > wrote: > >> > >>> Yeah, I have an actual customer who ran into this. Unfortunately, > >>> inconsistencies in the way things are named are pretty common - just > >>> look at Kafka's many CLI options. > >>> > >>> I don't think that supporting both and pointing at the docs with "I > >>> told you so" when our metrics break is a good solution. > >> > >> I agree, especially since we don't *already* have something in the docs > >> indicating this will be an issue. I was flippant about the situation > >> because I *wish* there was more careful consideration + naming policy in > >> place, but I realize that doesn't always happen in practice. I guess I > need > >> to take Compatibility Czar more seriously :) > >> > >> I see think the obvious practical options are as follows: > >> > >> 1. Kill support for "_". Piss off the entire set of people who currently > >> use "_" anywhere in topic names. > >> 2. Kill support for ".". Piss off the entire set of people who currently > >> use "." anywhere in topic names. > >> 3. Tell people they need to be careful about this issue. Piss off the > set > >> of people who use both "_" and "." *and* happen to have conflicting > topic > >> names. They will have some pain when they discover the issue and have to > >> figure out how to move one of those topics over to a non-conflicting > name. > >> I'm going to claim that this group must be an *extremely* small > fraction of > >> users, which doesn't make it better to allow things to break for them, > but > >> at least gives us an idea of the scale of impact. > >> > >> (One other alternative suggested earlier was encoding metric names to > >> account for differences; given the metric renaming mess in the last > >> release, I'm extremely hesitant to suggest anything of the sort...) > >> > >> None of the options are ideal, but to me, 3 seems like the least > painful. > >> Both for us, and for the vast majority of users. It seems to me that the > >> number of users that would complain about (1) or (2) drastically > outweigh > >> (3). > >> > >> At this point, I don't think it's practical to keep switching the rules > >> about which characters are allowed and which aren't because the previous > >> attempts haven't been successful -- it seems the rules have changed > >> multiple times, whether intentionally or accidentally, such that any > more > >> changes will cause problems. At this point, I think we just need to > accept > >> being liberal in accepting the range of topic names that have been > >> permitted so far and make the best of the situation, even if it means > only > >> being able to warn people of conflicts. > >> > >> Here's another alternative: how about being liberal with topic name > >> characters, but upon topic creation we convert the name to the metric > name > >> and fail if there's a conflict with another topic? This is relatively > >> expensive (requires getting the metric name of all other topics), but it > >> avoids the bad situation we're encountering here (conflicting metrics), > >> avoids getting into a persistent conflict (we kill topic creation when > we > >> detect the issue rather than noticing it when the metrics conflict > >> happens), and keeps the vast majority of existing users happy (both _ > and . > >> work in topic names as long as you don't create topics with conflicting > >> metric names). > >> > >> There are definitely details to be worked out (auto topic creation?), > but > >> it seems like a more realistic solution than to start disallowing _ or > . in > >> topic names. > > > > I was thinking the same. Allow a.b or a_b but not a.b and a_b. This > > seems like it will impact a trivial amount of users and keep both the > > "." and "_" camps happy. > > > >> > >> -Ewen > >> > >> > >>> > >>> On Fri, Jul 10, 2015 at 4:33 PM, Ewen Cheslack-Postava > >>> <e...@confluent.io> wrote: > >>>> I figure you'll probably see complaints no matter what change you > make. > >>>> Gwen, given that you raised this, another important question might be > how > >>>> many people you see using *both*. I'm guessing this question came up > >>>> because you actually saw a conflict? But I'd imagine (or at least > hope) > >>>> that most organizations are mostly consistent about naming topics -- > they > >>>> standardize on one or the other. > >>>> > >>>> Since there's no "right" way to name them, I'd just leave it > supporting > >>>> both and document the potential conflict in metrics. And if people use > >>> both > >>>> naming schemes, they probably deserve to suffer for their > inconsistency > >>> :) > >>>> > >>>> -Ewen > >>>> > >>>>> On Fri, Jul 10, 2015 at 3:28 PM, Gwen Shapira <gshap...@cloudera.com > > > >>>> wrote: > >>>> > >>>>> I find dots more common in my customer base, so I will definitely > feel > >>>>> the pain of removing them. > >>>>> > >>>>> However, "." are already used in metrics, file names, directories, > etc > >>>>> - so if we keep the dots, we need to keep code that translates them > >>>>> and document the translation. Just banning "." seems more natural. > >>>>> Also, as Grant mentioned, we'll probably have our own special usage > >>>>> for "." down the line. > >>>>> > >>>>>> On Fri, Jul 10, 2015 at 2:12 PM, Todd Palino <tpal...@gmail.com> > wrote: > >>>>>> I absolutely disagree with #2, Neha. That will break a lot of > >>>>>> infrastructure within LinkedIn. That said, removing "." might break > >>> other > >>>>>> people as well, but I think we should have a clearer idea of how > much > >>>>> usage > >>>>>> there is on either side. > >>>>>> > >>>>>> -Todd > >>>>>> > >>>>>> > >>>>>>> On Fri, Jul 10, 2015 at 2:08 PM, Neha Narkhede <n...@confluent.io> > >>>>>> wrote: > >>>>>> > >>>>>>> "." seems natural for grouping topic names. +1 for 2) going forward > >>> only > >>>>>>> without breaking previously created topics with "_" though that > might > >>>>>>> require us to patch the code somewhat awkwardly till we phase it > out > >>> a > >>>>>>> couple (purposely left vague to stay out of Ewen's wrath :-)) > >>> versions > >>>>>>> later. > >>>>>>> > >>>>>>> On Fri, Jul 10, 2015 at 2:02 PM, Gwen Shapira < > gshap...@cloudera.com > >>>> > >>>>>>> wrote: > >>>>>>> > >>>>>>>> I don't think we should break existing topics. Just disallow new > >>>>>>>> topics going forward. > >>>>>>>> > >>>>>>>> Agree that having both is horrible, but we should have a solution > >>> that > >>>>>>>> fails when you run "kafka_topics.sh --create", not when you > >>> configure > >>>>>>>> Ganglia. > >>>>>>>> > >>>>>>>> Gwen > >>>>>>>> > >>>>>>>> On Fri, Jul 10, 2015 at 1:53 PM, Jay Kreps <j...@confluent.io> > >>> wrote: > >>>>>>>>> Unfortunately '.' is pretty common too. I agree that it is > >>> perverse, > >>>>>>> but > >>>>>>>>> people seem to do it. Breaking all the topics with '.' in the > >>> name > >>>>>>> seems > >>>>>>>>> like it could be worse than combining metrics for people who > >>> have a > >>>>>>>>> 'foo_bar' AND 'foo.bar' (and after all, having both is DEEPLY > >>>>> perverse, > >>>>>>>>> no?). > >>>>>>>>> > >>>>>>>>> Where is our Dean of Compatibility, Ewen, on this? > >>>>>>>>> > >>>>>>>>> -Jay > >>>>>>>>> > >>>>>>>>> On Fri, Jul 10, 2015 at 1:32 PM, Todd Palino <tpal...@gmail.com> > >>>>>>> wrote: > >>>>>>>>> > >>>>>>>>>> My selfish point of view is that we do #1, as we use "_" > >>>>> extensively > >>>>>>> in > >>>>>>>>>> topic names here :) I also happen to think it's the right > >>> choice, > >>>>>>>>>> specifically because "." has more special meanings, as you > >>> noted. > >>>>>>>>>> > >>>>>>>>>> -Todd > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> On Fri, Jul 10, 2015 at 1:30 PM, Gwen Shapira < > >>>>> gshap...@cloudera.com> > >>>>>>>>>> wrote: > >>>>>>>>>> > >>>>>>>>>>> Unintentional side effect from allowing IP addresses in > >>> consumer > >>>>>>>> client > >>>>>>>>>>> IDs :) > >>>>>>>>>>> > >>>>>>>>>>> So the question is, what do we do now? > >>>>>>>>>>> > >>>>>>>>>>> 1) disallow "." > >>>>>>>>>>> 2) disallow "_" > >>>>>>>>>>> 3) find a reversible way to encode "." and "_" that won't > >>> break > >>>>>>>> existing > >>>>>>>>>>> metrics > >>>>>>>>>>> 4) all of the above? > >>>>>>>>>>> > >>>>>>>>>>> btw. it looks like "." and ".." are currently valid. Topic > >>> names > >>>>> are > >>>>>>>>>>> used for directories, right? this sounds like fun :) > >>>>>>>>>>> > >>>>>>>>>>> I vote for option #1, although if someone has a good idea for > >>> #3 > >>>>> it > >>>>>>>>>>> will be even better. > >>>>>>>>>>> > >>>>>>>>>>> Gwen > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> On Fri, Jul 10, 2015 at 1:22 PM, Grant Henke < > >>>>> ghe...@cloudera.com> > >>>>>>>>>> wrote: > >>>>>>>>>>>> Found it was added here: > >>>>>>>>>> https://issues.apache.org/jira/browse/KAFKA-697 > >>>>>>>>>>>> > >>>>>>>>>>>> On Fri, Jul 10, 2015 at 3:18 PM, Todd Palino < > >>>>> tpal...@gmail.com> > >>>>>>>>>> wrote: > >>>>>>>>>>>> > >>>>>>>>>>>>> This was definitely changed at some point after KAFKA-495. > >>> The > >>>>>>>>>> question > >>>>>>>>>>> is > >>>>>>>>>>>>> when and why. > >>>>>>>>>>>>> > >>>>>>>>>>>>> Here's the relevant code from that patch: > >>>>>>> =================================================================== > >>>>>>>>>>>>> --- core/src/main/scala/kafka/utils/Topic.scala (revision > >>>>>>> 1390178) > >>>>>>>>>>>>> +++ core/src/main/scala/kafka/utils/Topic.scala (working > >>> copy) > >>>>>>>>>>>>> @@ -21,24 +21,21 @@ > >>>>>>>>>>>>> import util.matching.Regex > >>>>>>>>>>>>> > >>>>>>>>>>>>> object Topic { > >>>>>>>>>>>>> + val legalChars = "[a-zA-Z0-9_-]" > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> -Todd > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> On Fri, Jul 10, 2015 at 1:02 PM, Grant Henke < > >>>>>>> ghe...@cloudera.com> > >>>>>>>>>>> wrote: > >>>>>>>>>>>>> > >>>>>>>>>>>>>> kafka.common.Topic shows that currently period is a valid > >>>>>>>> character > >>>>>>>>>>> and I > >>>>>>>>>>>>>> have verified I can use kafka-topics.sh to create a new > >>>>> topic > >>>>>>>> with a > >>>>>>>>>>>>>> period. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> AdminUtils.createOrUpdateTopicPartitionAssignmentPathInZK > >>>>>>>> currently > >>>>>>>>>>> uses > >>>>>>>>>>>>>> Topic.validate before writing to Zookeeper. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Should period character support be removed? I was under > >>> the > >>>>>>> same > >>>>>>>>>>>>> impression > >>>>>>>>>>>>>> as Gwen, that a period was used by many as a way to > >>> "group" > >>>>>>>> topics. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> The code is pasted below since its small: > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> object Topic { > >>>>>>>>>>>>>> val legalChars = "[a-zA-Z0-9\\._\\-]" > >>>>>>>>>>>>>> private val maxNameLength = 255 > >>>>>>>>>>>>>> private val rgx = new Regex(legalChars + "+") > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> val InternalTopics = > >>> Set(OffsetManager.OffsetsTopicName) > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> def validate(topic: String) { > >>>>>>>>>>>>>> if (topic.length <= 0) > >>>>>>>>>>>>>> throw new InvalidTopicException("topic name is > >>>>> illegal, > >>>>>>>> can't > >>>>>>>>>> be > >>>>>>>>>>>>>> empty") > >>>>>>>>>>>>>> else if (topic.equals(".") || topic.equals("..")) > >>>>>>>>>>>>>> throw new InvalidTopicException("topic name cannot > >>> be > >>>>>>>> \".\" or > >>>>>>>>>>>>>> \"..\"") > >>>>>>>>>>>>>> else if (topic.length > maxNameLength) > >>>>>>>>>>>>>> throw new InvalidTopicException("topic name is > >>>>> illegal, > >>>>>>>> can't > >>>>>>>>>> be > >>>>>>>>>>>>>> longer than " + maxNameLength + " characters") > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> rgx.findFirstIn(topic) match { > >>>>>>>>>>>>>> case Some(t) => > >>>>>>>>>>>>>> if (!t.equals(topic)) > >>>>>>>>>>>>>> throw new InvalidTopicException("topic name " + > >>>>> topic > >>>>>>>> + " > >>>>>>>>>> is > >>>>>>>>>>>>>> illegal, contains a character other than ASCII > >>>>> alphanumerics, > >>>>>>>> '.', > >>>>>>>>>> '_' > >>>>>>>>>>>>> and > >>>>>>>>>>>>>> '-'") > >>>>>>>>>>>>>> case None => throw new InvalidTopicException("topic > >>>>> name > >>>>>>> " > >>>>>>>> + > >>>>>>>>>>> topic > >>>>>>>>>>>>> + > >>>>>>>>>>>>>> " is illegal, contains a character other than ASCII > >>>>>>>> alphanumerics, > >>>>>>>>>>> '.', > >>>>>>>>>>>>>> '_' and '-'") > >>>>>>>>>>>>>> } > >>>>>>>>>>>>>> } > >>>>>>>>>>>>>> } > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> On Fri, Jul 10, 2015 at 2:50 PM, Todd Palino < > >>>>>>> tpal...@gmail.com> > >>>>>>>>>>> wrote: > >>>>>>>>>>>>>> > >>>>>>>>>>>>>>> I had to go look this one up again to make sure - > >>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/KAFKA-495 > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> The only valid character names for topics are > >>>>> alphanumeric, > >>>>>>>>>>> underscore, > >>>>>>>>>>>>>> and > >>>>>>>>>>>>>>> dash. A period is not supposed to be a valid character > >>> to > >>>>>>> use. > >>>>>>>> If > >>>>>>>>>>>>> you're > >>>>>>>>>>>>>>> seeing them, then one of two things have happened: > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> 1) You have topic names that are grandfathered in from > >>>>> before > >>>>>>>> that > >>>>>>>>>>>>> patch > >>>>>>>>>>>>>>> 2) The patch is not working properly and there is > >>>>> somewhere > >>>>>>> in > >>>>>>>> the > >>>>>>>>>>>>> broker > >>>>>>>>>>>>>>> that the standard is not being enforced. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> -Todd > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> On Fri, Jul 10, 2015 at 12:13 PM, Brock Noland < > >>>>>>>> br...@apache.org> > >>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> On Fri, Jul 10, 2015 at 11:34 AM, Gwen Shapira < > >>>>>>>>>>>>> gshap...@cloudera.com> > >>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>> Hi Kafka Fans, > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> If you have one topic named "kafka_lab_2" and the > >>>>> other > >>>>>>>> named > >>>>>>>>>>>>>>>>> "kafka.lab.2", the topic level metrics will be > >>> named > >>>>>>>>>> kafka_lab_2 > >>>>>>>>>>>>> for > >>>>>>>>>>>>>>>>> both, effectively making it impossible to monitor > >>> them > >>>>>>>>>> properly. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> The reason this happens is that using "." in topic > >>>>> names > >>>>>>> is > >>>>>>>>>>> pretty > >>>>>>>>>>>>>>>>> common, especially as a way to group topics into > >>> data > >>>>>>>> centers, > >>>>>>>>>>>>>>>>> relevant apps, etc - basically a work-around to our > >>>>>>> current > >>>>>>>>>>> lack of > >>>>>>>>>>>>>>>>> name spaces. However, most metric monitoring > >>> systems > >>>>>>> using > >>>>>>>> "." > >>>>>>>>>>> to > >>>>>>>>>>>>>>>>> annotate hierarchy, so to avoid issues around > >>> metric > >>>>>>> names, > >>>>>>>>>>> Kafka > >>>>>>>>>>>>>>>>> replaces the "." in the name with an underscore. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> This generates good metric names, but creates the > >>>>> problem > >>>>>>>> with > >>>>>>>>>>> name > >>>>>>>>>>>>>>>> collisions. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> I'm wondering if it makes sense to simply limit the > >>>>> range > >>>>>>>> of > >>>>>>>>>>>>>>>>> characters permitted in a topic name and disallow > >>> "_"? > >>>>>>>>>> Obviously > >>>>>>>>>>>>>>>>> existing topics will need to remain as is, which > >>> is a > >>>>> bit > >>>>>>>>>>> awkward. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Interesting problem! Many if not most users I > >>>>> personally am > >>>>>>>>>> aware > >>>>>>>>>>> of > >>>>>>>>>>>>>>>> use "_" as a separator in topic names. I am sure that > >>>>> many > >>>>>>>> users > >>>>>>>>>>>>> would > >>>>>>>>>>>>>>>> be quite surprised by this limitation. With that > >>> said, > >>>>> I am > >>>>>>>> sure > >>>>>>>>>>>>>>>> they'd transition accordingly. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> If anyone has better backward-compatible solutions > >>> to > >>>>>>> this, > >>>>>>>>>> I'm > >>>>>>>>>>> all > >>>>>>>>>>>>>>> ears > >>>>>>>>>>>>>>>> :) > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Gwen > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> -- > >>>>>>>>>>>>>> Grant Henke > >>>>>>>>>>>>>> Solutions Consultant | Cloudera > >>>>>>>>>>>>>> ghe...@cloudera.com | twitter.com/gchenke | > >>>>>>>>>>> linkedin.com/in/granthenke > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> -- > >>>>>>>>>>>> Grant Henke > >>>>>>>>>>>> Solutions Consultant | Cloudera > >>>>>>>>>>>> ghe...@cloudera.com | twitter.com/gchenke | > >>>>>>>> linkedin.com/in/granthenke > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> -- > >>>>>>> Thanks, > >>>>>>> Neha > >>>> > >>>> > >>>> > >>>> -- > >>>> Thanks, > >>>> Ewen > >> > >> > >> > >> -- > >> Thanks, > >> Ewen >