Re: Avro vs Protocol buffer for Samza output

Selina Tech Thu, 19 Nov 2015 14:42:19 -0800

Hi, Yi:

      Thanks a lot for your reply with information about Avro schema
registry.
      I studied the Avro message on Kafka after your reply, the Avro
message will automatically have  [magic byte][schema id][actual message] after
encode.
      Your mentioned " It is a specific way of maintaining compatibility
between producer and consumer in LinkedIn."  I am wondering how this work?
Any "AvroSchemaRegistry" API for Samza,  Kafka or Avro? Do you know any
link for this API or link for code example?
       In another word, If I send messages out with Schema Id1 to topic
"temp", and then later on I add or delete a filed and then the schema
changed. I send the messages out with Schema Id2 to topic "temp". When I
consumer the temp. how can I decode the message? Should I need the schema
Id, How can I get it? Does Kafka, Samza or Avro implement it?


Sincerely,
Selina

On Wed, Nov 18, 2015 at 5:29 PM, Yi Pan <nickpa...@gmail.com> wrote:

> Hi, Selina,
>
> Samza's producer/consumer is highly tunable. You can configure it to use
> ProtocolBufferSerde class if your messages in Kafka are in ProtocolBuf
> format. The use of Avro in Kafka is LinkedIn's choice and does not
> necessarily fit others.
>
> For the sake of "why LinkedIn uses Avro", here is the biggest reason:
> LinkedIn uses Avro schema registry to ensure that producer/consumer are
> using compatible Avro schema versions. It is a specific way of maintaining
> compatibility between producer and consumer in LinkedIn. ProtoBuf does not
> seem to have the schema registry functionality and requires re-compilation
> to make sure producer and consumer are compatible on the wire-format of the
> message.
>
> If you have other ways to maintain the compatibility between producer and
> consumers using ProtoBuf, I don't see why you cannot use ProtoBuf in Samza.
>
> Best,
>
> -Yi
>
> On Wed, Nov 18, 2015 at 3:43 PM, Selina Tech <swucaree...@gmail.com>
> wrote:
>
> > Dear All:
> >
> >       I need to generate some data by Samza to Kafka and then write to
> > Parquet formate file.  I was asked why I choose Avro type as my Samza
> > output to Kafka instead of Protocol Buffer. Since currently our data on
> > Kafka are all Protocol buffer.
> >       I explained for Avro encoded message -- The encoded size is
> smaller,
> > no extra code compile, implementation easier.  fast to
> > serialize/deserialize and support a lot language.  However some people
> > believe when encoded the Avro message take as much space as Protocol
> > buffer, but with schema, the size could be much bigger.
> >
> >       I am wondering if there are any other advantages make you choose
> Avro
> > as your message type at Kafka?
> >
> > Sincerely,
> > Selina
> >
>

Re: Avro vs Protocol buffer for Samza output

Reply via email to