… or is the payload of the message prepending with a magic byte followed by the SHA?
On Aug 22, 2013, at 9:49 AM, Mark <static.void....@gmail.com> wrote: > Are you referring to the same message class as: > https://github.com/apache/kafka/blob/0.7/core/src/main/scala/kafka/message/Message.scala > or are you talking bout a wrapper around this message class which has its > own magic byte followed by SHA of schema? If its the former, I'm confused. > > > FYI, Looks like Camus gets a 4 byte identifier from a schema registry. > > https://github.com/linkedin/camus/blob/master/camus-etl-kafka/src/main/java/com/linkedin/camus/etl/kafka/coders/KafkaAvroMessageEncoder.java > > > On Aug 22, 2013, at 9:37 AM, Neha Narkhede <neha.narkh...@gmail.com> wrote: > >> The point of the magic byte is to indicate the current version of the >> message format. One part of the format is the fact that it is Avro encoded. >> I'm not sure how Camus gets a 4 byte id, but at LinkedIn we use the 16 byte >> MD5 hash of the schema. Since AVRO-1124 is not resolved yet, I'm not sure >> if I can comment on the compatibility just yet. >> >> Thanks, >> Neha >> >> >> On Wed, Aug 21, 2013 at 9:00 PM, Mark <static.void....@gmail.com> wrote: >> >>> Neha, thanks for the response. >>> >>> So the only point of the magic byte is to indicate that the rest of the >>> message is Avro encoded? I noticed that in Camus a 4 byte int id of the >>> schema is written instead of the 16 byte SHA. Is this the new preferred >>> way? Which is compatible with >>> https://issues.apache.org/jira/browse/AVRO-1124? >>> >>> Thanks again >>> >>> On Aug 21, 2013, at 8:38 PM, Neha Narkhede <neha.narkh...@gmail.com> >>> wrote: >>> >>>> We define the LinkedIn Kafka message to have a magic byte (indicating >>> Avro >>>> serialization), MD5 header followed by the payload. The Hadoop consumer >>>> reads the MD5, looks up the schema in the repository and deserializes the >>>> message. >>>> >>>> Thanks, >>>> Neha >>>> >>>> >>>> On Wed, Aug 21, 2013 at 8:15 PM, Mark <static.void....@gmail.com> wrote: >>>> >>>>> Does LinkedIn include the SHA of the schema into the header of each Avro >>>>> message they write or do they wrap the avro message and prepend the SHA? >>>>> >>>>> In either case, how does the Hadoop consumer know what schema to read? >>> >>> >