[jira] [Commented] (KAFKA-2367) Add Copycat runtime data API

Ewen Cheslack-Postava (JIRA) Tue, 11 Aug 2015 21:36:02 -0700

    [ 
https://issues.apache.org/jira/browse/KAFKA-2367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14692893#comment-14692893
 ]


Ewen Cheslack-Postava commented on KAFKA-2367:
----------------------------------------------

[~wushujames] see the "Schema Versions and Projection" section on the wiki page 
I wrote up: https://cwiki.apache.org/confluence/display/KAFKA/Copycat+Data+API 
It isn't strictly necessary to support this in the data API (which isn't really 
internal, it is public API that connectors use), but it might be nice to 
provide for schema projection in that API so it doesn't need to be implemented 
by connectors or for each serializer implementation. This would be relevant, 
for example, in a sink connector that needs to normalize data (e.g., all data 
going into an Avro file in HDFS needs to have the same schema). If you ever 
have parts of the stream with mixed versions, you probably want to project to 
the later of the two schemas and write all the data using that updated schema.

> Add Copycat runtime data API
> ----------------------------
>
>                 Key: KAFKA-2367
>                 URL: https://issues.apache.org/jira/browse/KAFKA-2367
>             Project: Kafka
>          Issue Type: Sub-task
>          Components: copycat
>            Reporter: Ewen Cheslack-Postava
>            Assignee: Ewen Cheslack-Postava
>             Fix For: 0.8.3
>
>
> Design the API used for runtime data in Copycat. This API is used to 
> construct schemas and records that Copycat processes. This needs to be a 
> fairly general data model (think Avro, JSON, Protobufs, Thrift) in order to 
> support complex, varied data types that may be input from/output to many data 
> systems.
> This should issue should also address the serialization interfaces used 
> within Copycat, which translate the runtime data into serialized byte[] form. 
> It is important that these be considered together because the data format can 
> be used in multiple ways (records, partition IDs, partition offsets), so it 
> and the corresponding serializers must be sufficient for all these use cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-2367) Add Copycat runtime data API

Reply via email to