[ https://issues.apache.org/jira/browse/KAFKA-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14642362#comment-14642362 ]
ASF GitHub Bot commented on KAFKA-2366: --------------------------------------- GitHub user ewencp opened a pull request: https://github.com/apache/kafka/pull/99 KAFKA-2366 [WIP]; Copycat This is an initial patch implementing the basics of Copycat for KIP-26. The intent here is to start a review of the key pieces of the core API and get a reasonably functional, baseline, non-distributed implementation of Copycat in place to get things rolling. The current patch has a number of known issues that need to be addressed before a final version: * Some build-related issues. Specifically, requires some locally-installed dependencies (see below), ignores checkstyle for the runtime data library because it's lifted from Avro currently and likely won't last in its current form, and some Gradle task dependencies aren't quite right because I haven't gotten rid of the dependency on `core` (which should now be an easy patch since new consumer groups are in a much better state). * This patch currently depends on some Confluent trunk code because I prototyped with our Avro serializers w/ schema-registry support. We need to figure out what we want to provide as an example built-in set of serializers. Unlike core Kafka where we could ignore the issue, providing only ByteArray or String serializers, this is pretty central to how Copycat works. * This patch uses a hacked up version of Avro as its runtime data format. Not sure if we want to go through the entire API discussion just to get some basic code committed, so I filed KAFKA-2367 to handle that separately. The core connector APIs and the runtime data APIs are entirely orthogonal. * This patch needs some updates to get aligned with recent new consumer changes (specifically, I'm aware of the ConcurrentModificationException issue on exit). More generally, the new consumer is in flux but Copycat depends on it, so there are likely to be some negative interactions. * The layout feels a bit awkward to me right now because I ported it from a Maven layout. We don't have nearly the same level of granularity in Kafka currently (core and clients, plus the mostly ignored examples, log4j-appender, and a couple of contribs). We might want to reorganize, although keeping data+api separate from runtime and connector plugins is useful for minimizing dependencies. * There are a variety of other things (e.g., I'm not happy with the exception hierarchy/how they are currently handled, TopicPartition doesn't really need to be duplicated unless we want Copycat entirely isolated from the Kafka APIs, etc), but I expect those we'll cover in the review. Before commenting on the patch, it's probably worth reviewing https://issues.apache.org/jira/browse/KAFKA-2365 and https://issues.apache.org/jira/browse/KAFKA-2366 to get an idea of what I had in mind for a) what we ultimately want with all the Copycat patches and b) what we aim to cover in this initial patch. My hope is that we can use a WIP patch (after the current obvious deficiencies are addressed) while recognizing that we want to make iterative progress with a bunch of subsequent PRs. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ewencp/kafka copycat Alternatively you can review and apply these changes as the patch at: https://github.com/apache/kafka/pull/99.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #99 ---- commit 11981d2eaa2f61e81251104d6051acf6fd3911b3 Author: Ewen Cheslack-Postava <m...@ewencp.org> Date: 2015-07-24T20:20:15Z Add copycat-data and copycat-api commit 0233456c297c79c8f351dc7683a12b491d5682e8 Author: Ewen Cheslack-Postava <m...@ewencp.org> Date: 2015-07-24T21:59:54Z Add copycat-avro and copycat-runtime commit e14942cb20952263c26540fc333b7e3dc624c09c Author: Ewen Cheslack-Postava <m...@ewencp.org> Date: 2015-07-25T02:52:47Z Add Copycat file connector. commit 31cd1caf3c48417bcfb56b8c85dfd2419712953c Author: Ewen Cheslack-Postava <m...@ewencp.org> Date: 2015-07-26T20:48:00Z Add CLI tools for Copycat. commit 4a9b4f3c671bbba3b5d05a2ac6fed65b018649ee Author: Ewen Cheslack-Postava <m...@ewencp.org> Date: 2015-07-26T21:03:52Z Add some helpful Copycat-specific build and test targets that cover all Copycat packages. ---- > Initial patch for Copycat > ------------------------- > > Key: KAFKA-2366 > URL: https://issues.apache.org/jira/browse/KAFKA-2366 > Project: Kafka > Issue Type: Sub-task > Reporter: Ewen Cheslack-Postava > Assignee: Ewen Cheslack-Postava > Fix For: 0.8.3 > > > This covers the initial patch for Copycat. The goal here is to get some > baseline code in place, not necessarily the finalized implementation. > The key thing we'll want here is the connector/task API, which defines how > third parties write connectors. > Beyond that the goal is to have a basically functional standalone Copycat > implementation -- enough that we can run and test any connector code with > reasonable coverage of functionality; specifically, it's important that core > concepts like offset commit and resuming connector tasks function properly. > These two things obviously interact, so development of the standalone worker > may affect the design of connector APIs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)