date:20150621

Re: Review Request 35677: Patch for KAFKA-2288

2015-06-21 Thread Jun Rao


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35677/#review88723
---


Thanks for the patch. I created a topic w/o customized config. On broker 
startup, I still saw the following logging.

[2015-06-21 21:01:21,569] INFO Read configuration for topic test 
(kafka.server.KafkaServer)
[2015-06-21 21:01:21,569] INFO LogConfig values: 
segment.bytes = 1073741824
 (kafka.log.LogConfig)
 
In general, I am wondering if it will be too verbose to log even overriden 
configs when a broker has thousands of topics. Also, the user can always find 
out the overriden configs through the admin tool. The reason that we log the 
broker config is that there is no such tool.

- Jun Rao


On June 20, 2015, 12:59 a.m., Gwen Shapira wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/35677/
> ---
> 
> (Updated June 20, 2015, 12:59 a.m.)
> 
> 
> Review request for kafka.
> 
> 
> Bugs: KAFKA-2288
> https://issues.apache.org/jira/browse/KAFKA-2288
> 
> 
> Repository: kafka
> 
> 
> Description
> ---
> 
> minor corrections to LogConfig and KafkaConfigTest
> 
> 
> Diffs
> -
> 
>   clients/src/main/java/org/apache/kafka/common/config/AbstractConfig.java 
> bae528d31516679bed88ee61b408f209f185a8cc 
>   core/src/main/scala/kafka/log/LogConfig.scala 
> fc41132d2bf29439225ec581829eb479f98cc416 
>   core/src/main/scala/kafka/server/KafkaServer.scala 
> 52dc728bb1ab4b05e94dc528da1006040e2f28c9 
>   core/src/test/scala/unit/kafka/log/LogConfigTest.scala 
> 19dcb47f3f406b8d6c3668297450ab6b534e4471 
>   core/src/test/scala/unit/kafka/server/KafkaConfigConfigDefTest.scala 
> 98a5b042a710d3c1064b0379db1d152efc9eabee 
>   core/src/test/scala/unit/kafka/server/KafkaConfigTest.scala 
> 2428dbd7197a58cf4cad42ef82b385dab3a2b15e 
> 
> Diff: https://reviews.apache.org/r/35677/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Gwen Shapira
> 
>

Re: [DISCUSS] KIP-26 - Add Copycat, a connector framework for data import/export

2015-06-21 Thread Gwen Shapira

Ah, I see this in rejected alternatives now. Sorry :)

I actually prefer the idea of a separate project for framework +
connectors over having the framework be part of Apache Kafka.

Looking at nearby examples: Hadoop has created a wide ecosystem of
projects, with Sqoop and Flume supplying connectors. Spark on the
other hand keeps its subprojects as part of Apache Spark.

When I look at both projects, I see that Flume and Sqoop created
active communities (that was especially true a few years back when we
were rapidly growing), with many companies contributing. Spark OTOH
(and with all respect to my friends at Spark), has tons of
contributors to its core, but much less activity on its sub-projects
(for example, SparkStreaming). I strongly believe that SparkStreaming
is under-served by being a part of Spark, especially when compared to
Storm which is an independent project with its own community.

The way I see it, connector frameworks are significantly simpler than
distributed data stores (although they are pretty large in terms of
code base, especially with copycat having its own distributed
processing framework). Which means that the barrier to contribution to
connector frameworks is lower, both for contributing to the framework
and for contributing connectors. Separate communities can also have
different rules regarding dependencies and committership.
Committership is the big one, and IMO what prevents SparkStreaming
from growing - I can give someone commit bit on Sqoop without giving
them any power over Hadoop. Not true for Spark and SparkStreaming.
This means that a CopyCat community (with its own sexy cat logo) will
be able to attract more volunteers and grow at a faster pace than core
Kafka, making it more useful to the community.

The other part is that just like Kafka will be more useful with a
connector framework, a connector framework tends to work better when
there are lots of connectors. So if we decide to partition the Kafka /
Connector framework / Connectors triad, I'm not sure which
partitioning makes more sense. Giving CopyCat (I love the name. You
can say things like "get the data into MySQL and CC Kafka") its own
community will allow the CopyCat community to accept connector
contributions, which is good for CopyCat and for Kafka adoption.
Oracle and Netezza contributed connectors to Sqoop, they probably
couldn't contribute it at all if Sqoop was inside Hadoop, and they
can't really opensource their own stuff through Github, so it was a
win for our community. This doesn't negate the possibility to create
connectors for CopyCat and not contribute them to the community (like
the popular Teradata connector for Sqoop).

Regarding ease of use and adoption: Right now, a lot of people adopt
Kafka as stand-alone piece, while Hadoop usually shows up through a
distribution. I expect that soon people will start adopting Kafka
through distributions, so the framework and a collection of connectors
will be part of every distribution. In the same way that no one thinks
of Sqoop or Flume as stand alone projects. With a bunch of Kafka
distributions out there, people will get Kafka + Framework +
Connectors, with a core connection portion being common to multiple
distributions - this will allow even easier adoption, while allowing
the Kafka community to focus on core Kafka.

The point about documentation that Ewen has made in the KIP is a good
one. We definitely want to point people to the right place for export
/ import tools. However, it sounds solvable with few links.

Sorry for the lengthy essay - I'm a bit passionate about connectors
and want to see CopyCat off to a great start in life :)

(BTW. I think Apache is a great place for CopyCat. I'll be happy to
help with the process of incubating it)

On Fri, Jun 19, 2015 at 2:47 PM, Jay Kreps  wrote:
> I think we want the connectors to be federated just because trying to
> maintain all the connectors centrally would be really painful. I think if
> we really do this well we would want to have >100 of these connectors so it
> really won't make sense to maintain them with the project. I think the
> thought was just to include the framework and maybe one simple connector as
> an example.
>
> Thoughts?
>
> -Jay
>
> On Fri, Jun 19, 2015 at 2:38 PM, Gwen Shapira  wrote:
>
>> I think BikeShed will be a great name.
>>
>> Can you clarify the scope? The KIP discusses a framework and also few
>> examples for connectors. Does the addition include just the framework
>> (and perhaps an example or two), or do we plan to start accepting
>> connectors to Apache Kafka project?
>>
>> Gwen
>>
>> On Thu, Jun 18, 2015 at 3:09 PM, Jay Kreps  wrote:
>> > I think the only problem we came up with was that Kafka KopyKat
>> abbreviates
>> > as KKK which is not ideal in the US. Copykat would still be googlable
>> > without that issue. :-)
>> >
>> > -Jay
>> >
>> > On Thu, Jun 18, 2015 at 1:20 PM, Otis Gospodnetic <
>> > otis.gospodne...@gmail.com> wrote:
>> >
>> >> Just a comment on the

Re: Review Request 35677: Patch for KAFKA-2288

2015-06-21 Thread Gwen Shapira



> On June 22, 2015, 4:23 a.m., Jun Rao wrote:
> > Thanks for the patch. I created a topic w/o customized config. On broker 
> > startup, I still saw the following logging.
> > 
> > [2015-06-21 21:01:21,569] INFO Read configuration for topic test 
> > (kafka.server.KafkaServer)
> > [2015-06-21 21:01:21,569] INFO LogConfig values: 
> > segment.bytes = 1073741824
> >  (kafka.log.LogConfig)
> >  
> > In general, I am wondering if it will be too verbose to log even overriden 
> > configs when a broker has thousands of topics. Also, the user can always 
> > find out the overriden configs through the admin tool. The reason that we 
> > log the broker config is that there is no such tool.

You are right, it will be too much on large clusters.
I'll remove the extra logging.


- Gwen


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35677/#review88723
---


On June 20, 2015, 12:59 a.m., Gwen Shapira wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/35677/
> ---
> 
> (Updated June 20, 2015, 12:59 a.m.)
> 
> 
> Review request for kafka.
> 
> 
> Bugs: KAFKA-2288
> https://issues.apache.org/jira/browse/KAFKA-2288
> 
> 
> Repository: kafka
> 
> 
> Description
> ---
> 
> minor corrections to LogConfig and KafkaConfigTest
> 
> 
> Diffs
> -
> 
>   clients/src/main/java/org/apache/kafka/common/config/AbstractConfig.java 
> bae528d31516679bed88ee61b408f209f185a8cc 
>   core/src/main/scala/kafka/log/LogConfig.scala 
> fc41132d2bf29439225ec581829eb479f98cc416 
>   core/src/main/scala/kafka/server/KafkaServer.scala 
> 52dc728bb1ab4b05e94dc528da1006040e2f28c9 
>   core/src/test/scala/unit/kafka/log/LogConfigTest.scala 
> 19dcb47f3f406b8d6c3668297450ab6b534e4471 
>   core/src/test/scala/unit/kafka/server/KafkaConfigConfigDefTest.scala 
> 98a5b042a710d3c1064b0379db1d152efc9eabee 
>   core/src/test/scala/unit/kafka/server/KafkaConfigTest.scala 
> 2428dbd7197a58cf4cad42ef82b385dab3a2b15e 
> 
> Diff: https://reviews.apache.org/r/35677/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Gwen Shapira
> 
>

Re: Review Request 35677: Patch for KAFKA-2288

Re: [DISCUSS] KIP-26 - Add Copycat, a connector framework for data import/export

Re: Review Request 35677: Patch for KAFKA-2288

3 matches

Site Navigation

Mail list logo

Footer information