Ok someone answered a similar question in the Avro forum. It *sounds* like that the Avro messages sent to Kafka are wrapped and/or prepended with the SHA which is used by the consumer to lookup the schema. That makes more sense.
On Aug 20, 2013, at 11:09 AM, Mark <static.void....@gmail.com> wrote: > Thanks Jay I've already read the paper and Jira ticket (haven't read the > code) but I'm still confused on how to integrate this with Kafka. > > Say we write an Avro message (the message contains a SHA of the shcmea) to > Kafka and a consumer pulls of this message. How does the consume know how to > deserialize the message to even be able to get to the SHA to look up the full > schema. Would this require wrapping all messages in another type of message > like JSON { hash: <16 bytes>, message: <Avro encoded message in bytes> } > > On Aug 20, 2013, at 9:33 AM, Jay Kreps <jay.kr...@gmail.com> wrote: > >> This paper has more information on what we are doing at LinkedIn: >> http://sites.computer.org/debull/A12june/pipeline.pdf >> >> This Avro JIRA has a schema repository implementation similar to the one >> LinkedIn uses: >> https://issues.apache.org/jira/browse/AVRO-1124 >> >> -Jay >> >> >> On Tue, Aug 20, 2013 at 7:08 AM, Mark <static.void....@gmail.com> wrote: >> >>> Can someone break down how message serialization would work with Avro? >>> I've read instead of adding a schema to every single event it would be wise >>> to add some sort of fingerprint with each message to identify which schema >>> it should used. What I'm having trouble understanding is, how do we read >>> the fingerprint without a schema? Don't we need the schema to deserialize? >>> Same question goes for working with Hadoop.. how does the input format >>> know which schema to use? >>> >>> Thanks >