… or is the payload of the message prepending with a magic byte followed by the 

On Aug 22, 2013, at 9:49 AM, Mark <static.void....@gmail.com> wrote:

> Are you referring to the same message class as: 
> https://github.com/apache/kafka/blob/0.7/core/src/main/scala/kafka/message/Message.scala
>  or are you talking bout a wrapper around this message class which has its 
> own magic byte followed by SHA of schema? If its the former, I'm confused. 
> FYI, Looks like Camus gets a 4 byte identifier from a schema registry.
> https://github.com/linkedin/camus/blob/master/camus-etl-kafka/src/main/java/com/linkedin/camus/etl/kafka/coders/KafkaAvroMessageEncoder.java
> On Aug 22, 2013, at 9:37 AM, Neha Narkhede <neha.narkh...@gmail.com> wrote:
>> The point of the magic byte is to indicate the current version of the
>> message format. One part of the format is the fact that it is Avro encoded.
>> I'm not sure how Camus gets a 4 byte id, but at LinkedIn we use the 16 byte
>> MD5 hash of the schema. Since AVRO-1124 is not resolved yet, I'm not sure
>> if I can comment on the compatibility just yet.
>> Thanks,
>> Neha
>> On Wed, Aug 21, 2013 at 9:00 PM, Mark <static.void....@gmail.com> wrote:
>>> Neha, thanks for the response.
>>> So the only point of the magic byte is to indicate that the rest of the
>>> message is Avro encoded? I noticed that in Camus a 4 byte int id of the
>>> schema is written instead of the 16 byte SHA. Is this the new preferred
>>> way? Which is compatible with
>>> https://issues.apache.org/jira/browse/AVRO-1124?
>>> Thanks again
>>> On Aug 21, 2013, at 8:38 PM, Neha Narkhede <neha.narkh...@gmail.com>
>>> wrote:
>>>> We define the LinkedIn Kafka message to have a magic byte (indicating
>>> Avro
>>>> serialization), MD5 header followed by the payload. The Hadoop consumer
>>>> reads the MD5, looks up the schema in the repository and deserializes the
>>>> message.
>>>> Thanks,
>>>> Neha
>>>> On Wed, Aug 21, 2013 at 8:15 PM, Mark <static.void....@gmail.com> wrote:
>>>>> Does LinkedIn include the SHA of the schema into the header of each Avro
>>>>> message they write or do they wrap the avro message and prepend the SHA?
>>>>> In either case, how does the Hadoop consumer know what schema to read?

