Hi, Yi: Thanks a lot for your reply with information about Avro schema registry. I studied the Avro message on Kafka after your reply, the Avro message will automatically have [magic byte][schema id][actual message] after encode. Your mentioned " It is a specific way of maintaining compatibility between producer and consumer in LinkedIn." I am wondering how this work? Any "AvroSchemaRegistry" API for Samza, Kafka or Avro? Do you know any link for this API or link for code example? In another word, If I send messages out with Schema Id1 to topic "temp", and then later on I add or delete a filed and then the schema changed. I send the messages out with Schema Id2 to topic "temp". When I consumer the temp. how can I decode the message? Should I need the schema Id, How can I get it? Does Kafka, Samza or Avro implement it?
Sincerely, Selina On Wed, Nov 18, 2015 at 5:29 PM, Yi Pan <nickpa...@gmail.com> wrote: > Hi, Selina, > > Samza's producer/consumer is highly tunable. You can configure it to use > ProtocolBufferSerde class if your messages in Kafka are in ProtocolBuf > format. The use of Avro in Kafka is LinkedIn's choice and does not > necessarily fit others. > > For the sake of "why LinkedIn uses Avro", here is the biggest reason: > LinkedIn uses Avro schema registry to ensure that producer/consumer are > using compatible Avro schema versions. It is a specific way of maintaining > compatibility between producer and consumer in LinkedIn. ProtoBuf does not > seem to have the schema registry functionality and requires re-compilation > to make sure producer and consumer are compatible on the wire-format of the > message. > > If you have other ways to maintain the compatibility between producer and > consumers using ProtoBuf, I don't see why you cannot use ProtoBuf in Samza. > > Best, > > -Yi > > On Wed, Nov 18, 2015 at 3:43 PM, Selina Tech <swucaree...@gmail.com> > wrote: > > > Dear All: > > > > I need to generate some data by Samza to Kafka and then write to > > Parquet formate file. I was asked why I choose Avro type as my Samza > > output to Kafka instead of Protocol Buffer. Since currently our data on > > Kafka are all Protocol buffer. > > I explained for Avro encoded message -- The encoded size is > smaller, > > no extra code compile, implementation easier. fast to > > serialize/deserialize and support a lot language. However some people > > believe when encoded the Avro message take as much space as Protocol > > buffer, but with schema, the size could be much bigger. > > > > I am wondering if there are any other advantages make you choose > Avro > > as your message type at Kafka? > > > > Sincerely, > > Selina > > >