[ https://issues.apache.org/jira/browse/KAFKA-2367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14702063#comment-14702063 ]
Neha Narkhede commented on KAFKA-2367: -------------------------------------- I think there are various tradeoffs, as with most choices that a framework is presented with :) The tradeoffs I see are: 1. Agility vs maturity: The maturity argument is that Avro is an advanced serialization library that already exists and in spite of having been through various compatibility issues, is now well tested and adopted. The agility argument against Avro is that for a new framework like Copycat, we might be able to move faster (over several releases) by owning and fixing our runtime data model, while not waiting for the Avro community to release a patched version. This is a problem we struggled with ZkClient, codahale-metrics and ZooKeeper on the core Kafka side and though one can argue that the Avro community is better, this still remains a concern. The success of the Copycat framework depends on its ability to always be the present framework for copying data to Kafka and as an early project, agility is key. 2. Cost/time savings vs control: The cost/time saving argument goes for adopting Avro even if we really need a very small percentage of it. This does save us a little time upfront but the downside is that now we end up having Copycat depend on Avro (and all its dependencies). I'm totally in favor of using a mature open source library but observing the size of the code we need to pull from Avro, I couldn't convince myself of the benefit it presents in saving some effort upfront. After all, there will be bugs in either codebase, we'd have to find the fastest way to fix those. 3. Generic public interface to encourage connector developers: This is a very "right-brain" argument and a subtle one. I agree with [~jkreps] here. Given that our goal should be to attract a large ecosystem of connectors, I would want us to remove every bit of pain and friction that would cause connector developers to either question our choice of Avro or spend time clarifying its impact on them. I understand that in practice this isn't a concern and as long we have the right serializers, this will not even be quite so visible but a simple SchemaBuilder imported from org.apache.avro can start this discussion and distract connector developers who aren't necessarily Avro fans. Overall, given the tradeoffs, I'm leaning towards us picking a custom one and not depending on all of Avro. > Add Copycat runtime data API > ---------------------------- > > Key: KAFKA-2367 > URL: https://issues.apache.org/jira/browse/KAFKA-2367 > Project: Kafka > Issue Type: Sub-task > Components: copycat > Reporter: Ewen Cheslack-Postava > Assignee: Ewen Cheslack-Postava > Fix For: 0.8.3 > > > Design the API used for runtime data in Copycat. This API is used to > construct schemas and records that Copycat processes. This needs to be a > fairly general data model (think Avro, JSON, Protobufs, Thrift) in order to > support complex, varied data types that may be input from/output to many data > systems. > This should issue should also address the serialization interfaces used > within Copycat, which translate the runtime data into serialized byte[] form. > It is important that these be considered together because the data format can > be used in multiple ways (records, partition IDs, partition offsets), so it > and the corresponding serializers must be sufficient for all these use cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)