Re: [Discussion] Limitations on topic names

2015-07-24 Thread Grant Henke
Noting here that the period '.' also causes potentially confusing behavior when using regex whitelists or blacklists. It can be easily worked around but users need to be aware of escaping the period. If I create two topics 'a.c' and 'abc' and start the following consumer, both topics will be consu

Re: [Discussion] Limitations on topic names

2015-07-13 Thread Joel Koshy
One way to get around this conflict could be to replace . with _ and _ with __ On Sat, Jul 11, 2015 at 10:33 AM, Todd Palino wrote: > I tend to agree with this as a compromise at this point. The reality is that > this is technical debt that has built up in the project, and it does not go > away

Re: [Discussion] Limitations on topic names

2015-07-13 Thread Joel Koshy
This did come up in the discussion in KAFKA-1902. It is somewhat concerning that something very specific - in this case (what I think is a limitation [1]) in certain metric reporters should drive the decision on what constitutes a legal topic name in Kafka - especially when all the characters in qu

Re: [Discussion] Limitations on topic names

2015-07-13 Thread Jun Rao
Magnus, Converting dot to _ essentially is our way of escaping in the scope part of the metric name. The issue is that your options of escaping is limited due to the constraints in the reporters. For example, the Ganglia reporter replaces anything other than alpha-numeric, -, _ and dot to _ in the

Re: [Discussion] Limitations on topic names

2015-07-12 Thread Magnus Edenhill
Hi, since dots seem to be a problem on the metrics side, why not let the metrics side handle it by escaping troublesome characters? E.g. "foo.my\.topic.feh" Let's not push the problem upstream. Replacing "." with another set of allowed characters "__" seems like a bad idea since it is ambigious:

Re: [Discussion] Limitations on topic names

2015-07-12 Thread Jun Rao
First, a couple of clarifications on this. 1. Currently, we allow Kafka topic to have dots, except that we disallow topic names that are exactly "." or ".." (which can cause weird problems when mapping to file directories and ZK paths as Gwen pointed out). 2. When creating the Coda Hale metrics,

Re: [Discussion] Limitations on topic names

2015-07-12 Thread Gwen Shapira
I like the "lets warn people of conflicts when creating the topic" suggestion. IMO, automatic topic creation as currently done is buggy either way (Send data and hope the topic is ready before retries run out, potentially failing with the super helpful NO_LEADER error), so I don't mind leaving it b

Re: [Discussion] Limitations on topic names

2015-07-12 Thread Joe Stein
Can we provide a tool so folks can "sync back" old topic names to new so their clusters aren't format lopsided. ~ Joestein On Jul 11, 2015 1:33 PM, "Todd Palino" wrote: > I tend to agree with this as a compromise at this point. The reality is > that this is technical debt that has built up in th

Re: [Discussion] Limitations on topic names

2015-07-11 Thread Todd Palino
I tend to agree with this as a compromise at this point. The reality is that this is technical debt that has built up in the project, and it does not go away by documenting it, and it will only get worse. As pointed out, eliminating either character at this point is going to cause problems for

Re: [Discussion] Limitations on topic names

2015-07-11 Thread Brock Noland
On Sat, Jul 11, 2015 at 12:54 AM, Ewen Cheslack-Postava wrote: > On Fri, Jul 10, 2015 at 4:41 PM, Gwen Shapira wrote: > >> Yeah, I have an actual customer who ran into this. Unfortunately, >> inconsistencies in the way things are named are pretty common - just >> look at Kafka's many CLI options.

Re: [Discussion] Limitations on topic names

2015-07-11 Thread Guozhang Wang
For resolving the metrics conflicts, we can alternatively let Kafka to replace "." with double underscores "__" if that is the primary reason for topic name restrictions. Guozhang On Sat, Jul 11, 2015 at 12:54 AM, Ewen Cheslack-Postava wrote: > On Fri, Jul 10, 2015 at 4:41 PM, Gwen Shapira > w

Re: [Discussion] Limitations on topic names

2015-07-11 Thread Ewen Cheslack-Postava
On Fri, Jul 10, 2015 at 4:41 PM, Gwen Shapira wrote: > Yeah, I have an actual customer who ran into this. Unfortunately, > inconsistencies in the way things are named are pretty common - just > look at Kafka's many CLI options. > > I don't think that supporting both and pointing at the docs with

Re: [Discussion] Limitations on topic names

2015-07-10 Thread Gwen Shapira
Yeah, I have an actual customer who ran into this. Unfortunately, inconsistencies in the way things are named are pretty common - just look at Kafka's many CLI options. I don't think that supporting both and pointing at the docs with "I told you so" when our metrics break is a good solution. On F

Re: [Discussion] Limitations on topic names

2015-07-10 Thread Ewen Cheslack-Postava
I figure you'll probably see complaints no matter what change you make. Gwen, given that you raised this, another important question might be how many people you see using *both*. I'm guessing this question came up because you actually saw a conflict? But I'd imagine (or at least hope) that most or

Re: [Discussion] Limitations on topic names

2015-07-10 Thread Gwen Shapira
I find dots more common in my customer base, so I will definitely feel the pain of removing them. However, "." are already used in metrics, file names, directories, etc - so if we keep the dots, we need to keep code that translates them and document the translation. Just banning "." seems more nat

Re: [Discussion] Limitations on topic names

2015-07-10 Thread Todd Palino
I absolutely disagree with #2, Neha. That will break a lot of infrastructure within LinkedIn. That said, removing "." might break other people as well, but I think we should have a clearer idea of how much usage there is on either side. -Todd On Fri, Jul 10, 2015 at 2:08 PM, Neha Narkhede wrote

Re: [Discussion] Limitations on topic names

2015-07-10 Thread Ashish Singh
The problem with '.' seems only to be in case of metrics. Should kafka replace '.' with some special character, not in [a-zA-Z0-9\\._\\-] or some reserved seq of characters? On Fri, Jul 10, 2015 at 2:08 PM, Neha Narkhede wrote: > "." seems natural for grouping topic names. +1 for 2) going forwar

Re: [Discussion] Limitations on topic names

2015-07-10 Thread Neha Narkhede
"." seems natural for grouping topic names. +1 for 2) going forward only without breaking previously created topics with "_" though that might require us to patch the code somewhat awkwardly till we phase it out a couple (purposely left vague to stay out of Ewen's wrath :-)) versions later. On Fri

Re: [Discussion] Limitations on topic names

2015-07-10 Thread Todd Palino
Yes, agree here. While it can be a little confusing, I think it's better to just disallow the character for all creation steps so you can't create more "bad" topic names, but not try and enforce it for topics that already exist. Anyone who is in that situation is already there with regards to metri

Re: [Discussion] Limitations on topic names

2015-07-10 Thread Gwen Shapira
I don't think we should break existing topics. Just disallow new topics going forward. Agree that having both is horrible, but we should have a solution that fails when you run "kafka_topics.sh --create", not when you configure Ganglia. Gwen On Fri, Jul 10, 2015 at 1:53 PM, Jay Kreps wrote: > U

Re: [Discussion] Limitations on topic names

2015-07-10 Thread Jay Kreps
Unfortunately '.' is pretty common too. I agree that it is perverse, but people seem to do it. Breaking all the topics with '.' in the name seems like it could be worse than combining metrics for people who have a 'foo_bar' AND 'foo.bar' (and after all, having both is DEEPLY perverse, no?). Where

Re: [Discussion] Limitations on topic names

2015-07-10 Thread Grant Henke
I vote for #1 too. A special reason Kafka may use '.' in the future is for hierarchical or namespaced topics. On Fri, Jul 10, 2015 at 3:32 PM, Todd Palino wrote: > My selfish point of view is that we do #1, as we use "_" extensively in > topic names here :) I also happen to think it's the right

Re: [Discussion] Limitations on topic names

2015-07-10 Thread Todd Palino
My selfish point of view is that we do #1, as we use "_" extensively in topic names here :) I also happen to think it's the right choice, specifically because "." has more special meanings, as you noted. -Todd On Fri, Jul 10, 2015 at 1:30 PM, Gwen Shapira wrote: > Unintentional side effect fro

Re: [Discussion] Limitations on topic names

2015-07-10 Thread Todd Palino
Thanks, Grant. That seems like a bad solution to the problem that John ran into in that ticket. It's entirely reasonable to have separate validators for separate things, but it seems like the choice was made to try and mash it all into a single validator. And it appears that despite the commentary

Re: [Discussion] Limitations on topic names

2015-07-10 Thread Gwen Shapira
Unintentional side effect from allowing IP addresses in consumer client IDs :) So the question is, what do we do now? 1) disallow "." 2) disallow "_" 3) find a reversible way to encode "." and "_" that won't break existing metrics 4) all of the above? btw. it looks like "." and ".." are currentl

Re: [Discussion] Limitations on topic names

2015-07-10 Thread Grant Henke
Found it was added here: https://issues.apache.org/jira/browse/KAFKA-697 On Fri, Jul 10, 2015 at 3:18 PM, Todd Palino wrote: > This was definitely changed at some point after KAFKA-495. The question is > when and why. > > Here's the relevant code from that patch: > >

Re: [Discussion] Limitations on topic names

2015-07-10 Thread Todd Palino
This was definitely changed at some point after KAFKA-495. The question is when and why. Here's the relevant code from that patch: === --- core/src/main/scala/kafka/utils/Topic.scala (revision 1390178) +++ core/src/main/scala/kafka/u

Re: [Discussion] Limitations on topic names

2015-07-10 Thread Grant Henke
kafka.common.Topic shows that currently period is a valid character and I have verified I can use kafka-topics.sh to create a new topic with a period. AdminUtils.createOrUpdateTopicPartitionAssignmentPathInZK currently uses Topic.validate before writing to Zookeeper. Should period character supp

Re: [Discussion] Limitations on topic names

2015-07-10 Thread Todd Palino
I had to go look this one up again to make sure - https://issues.apache.org/jira/browse/KAFKA-495 The only valid character names for topics are alphanumeric, underscore, and dash. A period is not supposed to be a valid character to use. If you're seeing them, then one of two things have happened:

Re: [Discussion] Limitations on topic names

2015-07-10 Thread Brock Noland
On Fri, Jul 10, 2015 at 11:34 AM, Gwen Shapira wrote: > Hi Kafka Fans, > > If you have one topic named "kafka_lab_2" and the other named > "kafka.lab.2", the topic level metrics will be named kafka_lab_2 for > both, effectively making it impossible to monitor them properly. > > The reason this hap

[Discussion] Limitations on topic names

2015-07-10 Thread Gwen Shapira
Hi Kafka Fans, If you have one topic named "kafka_lab_2" and the other named "kafka.lab.2", the topic level metrics will be named kafka_lab_2 for both, effectively making it impossible to monitor them properly. The reason this happens is that using "." in topic names is pretty common, especially