Thanks Jay I've already read the paper and Jira ticket (haven't read the code) but I'm still confused on how to integrate this with Kafka.
Say we write an Avro message (the message contains a SHA of the shcmea) to Kafka and a consumer pulls of this message. How does the consume know how to deserialize the message to even be able to get to the SHA to look up the full schema. Would this require wrapping all messages in another type of message like JSON { hash: <16 bytes>, message: <Avro encoded message in bytes> } On Aug 20, 2013, at 9:33 AM, Jay Kreps <jay.kr...@gmail.com> wrote: > This paper has more information on what we are doing at LinkedIn: > http://sites.computer.org/debull/A12june/pipeline.pdf > > This Avro JIRA has a schema repository implementation similar to the one > LinkedIn uses: > https://issues.apache.org/jira/browse/AVRO-1124 > > -Jay > > > On Tue, Aug 20, 2013 at 7:08 AM, Mark <static.void....@gmail.com> wrote: > >> Can someone break down how message serialization would work with Avro? >> I've read instead of adding a schema to every single event it would be wise >> to add some sort of fingerprint with each message to identify which schema >> it should used. What I'm having trouble understanding is, how do we read >> the fingerprint without a schema? Don't we need the schema to deserialize? >> Same question goes for working with Hadoop.. how does the input format >> know which schema to use? >> >> Thanks