Re: More questions on avro serialization

Mark Thu, 22 Aug 2013 09:49:51 -0700

Are you referring to the same message class as: 
https://github.com/apache/kafka/blob/0.7/core/src/main/scala/kafka/message/Message.scala
 or are you talking bout a wrapper around this message class which has its own 
magic byte followed by SHA of schema? If its the former, I'm confused.



FYI, Looks like Camus gets a 4 byte identifier from a schema registry.

https://github.com/linkedin/camus/blob/master/camus-etl-kafka/src/main/java/com/linkedin/camus/etl/kafka/coders/KafkaAvroMessageEncoder.java


On Aug 22, 2013, at 9:37 AM, Neha Narkhede <neha.narkh...@gmail.com> wrote:

> The point of the magic byte is to indicate the current version of the
> message format. One part of the format is the fact that it is Avro encoded.
> I'm not sure how Camus gets a 4 byte id, but at LinkedIn we use the 16 byte
> MD5 hash of the schema. Since AVRO-1124 is not resolved yet, I'm not sure
> if I can comment on the compatibility just yet.
> 
> Thanks,
> Neha
> 
> 
> On Wed, Aug 21, 2013 at 9:00 PM, Mark <static.void....@gmail.com> wrote:
> 
>> Neha, thanks for the response.
>> 
>> So the only point of the magic byte is to indicate that the rest of the
>> message is Avro encoded? I noticed that in Camus a 4 byte int id of the
>> schema is written instead of the 16 byte SHA. Is this the new preferred
>> way? Which is compatible with
>> https://issues.apache.org/jira/browse/AVRO-1124?
>> 
>> Thanks again
>> 
>> On Aug 21, 2013, at 8:38 PM, Neha Narkhede <neha.narkh...@gmail.com>
>> wrote:
>> 
>>> We define the LinkedIn Kafka message to have a magic byte (indicating
>> Avro
>>> serialization), MD5 header followed by the payload. The Hadoop consumer
>>> reads the MD5, looks up the schema in the repository and deserializes the
>>> message.
>>> 
>>> Thanks,
>>> Neha
>>> 
>>> 
>>> On Wed, Aug 21, 2013 at 8:15 PM, Mark <static.void....@gmail.com> wrote:
>>> 
>>>> Does LinkedIn include the SHA of the schema into the header of each Avro
>>>> message they write or do they wrap the avro message and prepend the SHA?
>>>> 
>>>> In either case, how does the Hadoop consumer know what schema to read?
>> 
>>

Re: More questions on avro serialization

Reply via email to