Ok someone answered a similar question in the Avro forum.

It *sounds* like that the Avro messages sent to Kafka are wrapped and/or 
prepended with the SHA which is used by the consumer to lookup the schema. That 
makes more sense. 

On Aug 20, 2013, at 11:09 AM, Mark <static.void....@gmail.com> wrote:

> Thanks Jay I've already read the paper and Jira ticket (haven't read the 
> code) but I'm still confused on how to integrate this with Kafka. 
> 
> Say we write an Avro message (the message contains a SHA of the shcmea) to 
> Kafka and a consumer pulls of this message. How does the consume know how to 
> deserialize the message to even be able to get to the SHA to look up the full 
> schema. Would this require wrapping all messages in another type of message 
> like JSON { hash:  <16 bytes>, message: <Avro encoded message in bytes> }
> 
> On Aug 20, 2013, at 9:33 AM, Jay Kreps <jay.kr...@gmail.com> wrote:
> 
>> This paper has more information on what we are doing at LinkedIn:
>> http://sites.computer.org/debull/A12june/pipeline.pdf
>> 
>> This Avro JIRA has a schema repository implementation similar to the one
>> LinkedIn uses:
>> https://issues.apache.org/jira/browse/AVRO-1124
>> 
>> -Jay
>> 
>> 
>> On Tue, Aug 20, 2013 at 7:08 AM, Mark <static.void....@gmail.com> wrote:
>> 
>>> Can someone break down how message serialization would work with Avro?
>>> I've read instead of adding a schema to every single event it would be wise
>>> to add some sort of fingerprint with each message to identify which schema
>>> it should used. What I'm having trouble understanding is, how do we read
>>> the fingerprint without a schema? Don't we need the schema to deserialize?
>>> Same question goes for working with Hadoop.. how does the input format
>>> know which schema to use?
>>> 
>>> Thanks
> 

Reply via email to