[ https://issues.apache.org/jira/browse/KAFKA-2367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14692853#comment-14692853 ]
Ewen Cheslack-Postava commented on KAFKA-2367: ---------------------------------------------- The runtime API should not affect serialization at all. So the JSON comment isn't relevant I think -- if we wanted to use Avro for the runtime API, we would really just be lifting the Schema and GenericRecord classes but none of the serialization code. I personally don't have any issue with doing that, but the concern was that someone a) might not like adding Avro as a dependency and b) that we do want to support different serialization formats (which, at a minimum, is necessary since you may have data in other formats delivered by other tools to Kafka, and we still want Copycat to be able to push that data to other systems such as HDFS) and don't want to treat Avro as a first class citizen and other formats as second class. If nobody objects, I think using Avro directly isn't a bad choice. I dislike some of its choices (in particular that nullable fields need to be defined as union types with the null type), but I agree it would be better to offload maintaining that code to another project that is already going to be doing it anyway and it does have well thought through schema migration support. > Add Copycat runtime data API > ---------------------------- > > Key: KAFKA-2367 > URL: https://issues.apache.org/jira/browse/KAFKA-2367 > Project: Kafka > Issue Type: Sub-task > Components: copycat > Reporter: Ewen Cheslack-Postava > Assignee: Ewen Cheslack-Postava > Fix For: 0.8.3 > > > Design the API used for runtime data in Copycat. This API is used to > construct schemas and records that Copycat processes. This needs to be a > fairly general data model (think Avro, JSON, Protobufs, Thrift) in order to > support complex, varied data types that may be input from/output to many data > systems. > This should issue should also address the serialization interfaces used > within Copycat, which translate the runtime data into serialized byte[] form. > It is important that these be considered together because the data format can > be used in multiple ways (records, partition IDs, partition offsets), so it > and the corresponding serializers must be sufficient for all these use cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)