[ 
https://issues.apache.org/jira/browse/KAFKA-2367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14702063#comment-14702063
 ] 

Neha Narkhede commented on KAFKA-2367:
--------------------------------------

I think there are various tradeoffs, as with most choices that a framework is 
presented with :)

The tradeoffs I see are:
1. Agility vs maturity: The maturity argument is that Avro is an advanced 
serialization library that already exists and in spite of having been through 
various compatibility issues, is now well tested and adopted. The agility 
argument against Avro is that for a new framework like Copycat, we might be 
able to move faster (over several releases) by owning and fixing our runtime 
data model, while not waiting for the Avro community to release a patched 
version. This is a problem we struggled with ZkClient, codahale-metrics and 
ZooKeeper on the core Kafka side and though one can argue that the Avro 
community is better, this still remains a concern. The success of the Copycat 
framework depends on its ability to always be the present framework for copying 
data to Kafka and as an early project, agility is key.
2. Cost/time savings vs control: The cost/time saving argument goes for 
adopting Avro even if we really need a very small percentage of it. This does 
save us a little time upfront but the downside is that now we end up having 
Copycat depend on Avro (and all its dependencies). I'm totally in favor of 
using a mature open source library but observing the size of the code we need 
to pull from Avro, I couldn't convince myself of the benefit it presents in 
saving some effort upfront. After all, there will be bugs in either codebase, 
we'd have to find the fastest way to fix those.
3. Generic public interface to encourage connector developers: This is a very 
"right-brain" argument and a subtle one. I agree with [~jkreps] here. Given 
that our goal should be to attract a large ecosystem of connectors, I would 
want us to remove every bit of pain and friction that would cause connector 
developers to either question our choice of Avro or spend time clarifying its 
impact on them. I understand that in practice this isn't a concern and as long 
we have the right serializers, this will not even be quite so visible but a 
simple SchemaBuilder imported from org.apache.avro can start this discussion 
and distract connector developers who aren't necessarily Avro fans. 

Overall, given the tradeoffs, I'm leaning towards us picking a custom one and 
not depending on all of Avro. 

> Add Copycat runtime data API
> ----------------------------
>
>                 Key: KAFKA-2367
>                 URL: https://issues.apache.org/jira/browse/KAFKA-2367
>             Project: Kafka
>          Issue Type: Sub-task
>          Components: copycat
>            Reporter: Ewen Cheslack-Postava
>            Assignee: Ewen Cheslack-Postava
>             Fix For: 0.8.3
>
>
> Design the API used for runtime data in Copycat. This API is used to 
> construct schemas and records that Copycat processes. This needs to be a 
> fairly general data model (think Avro, JSON, Protobufs, Thrift) in order to 
> support complex, varied data types that may be input from/output to many data 
> systems.
> This should issue should also address the serialization interfaces used 
> within Copycat, which translate the runtime data into serialized byte[] form. 
> It is important that these be considered together because the data format can 
> be used in multiple ways (records, partition IDs, partition offsets), so it 
> and the corresponding serializers must be sufficient for all these use cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to