Re: Avro vs Protocol buffer for Samza output

2015-11-19 Thread Selina Tech
Hi, Yi: Thanks a lot for your reply with information about Avro schema registry. I studied the Avro message on Kafka after your reply, the Avro message will automatically have [magic byte][schema id][actual message] after encode. Your mentioned " It is a specific way of maintain

Re: Avro vs Protocol buffer for Samza output

2015-11-18 Thread Yi Pan
Yeah, this reduced-overhead message format calls for the need to have an Avro schema registry s.t. you can lookup the actual Avro schema via the schemaId. On Wed, Nov 18, 2015 at 5:53 PM, Selina Tech wrote: > Hi, Yi: > > I think I got the answer as below: > > "The Kafka message format starts

Re: Avro vs Protocol buffer for Samza output

2015-11-18 Thread Yi Pan
Hi, Selina, On Wed, Nov 18, 2015 at 5:43 PM, Selina Tech wrote: > Hi, Yi: > Thanks for your reply. Do you mean there is no advantage of Avro > message vs Protocol buffer message on Kafka except Avro schema registry? > > Well, be careful about interpreting my words in this way. I did not

Re: Avro vs Protocol buffer for Samza output

2015-11-18 Thread Selina Tech
Hi, Yi: I think I got the answer as below: "The Kafka message format starts with a magic byte indicating what kind of serialization is used for this message. And if this byte indicates Avro, you can layout your message as starting with the schemaId and then followed by message payload. Upon c

Re: Avro vs Protocol buffer for Samza output

2015-11-18 Thread Selina Tech
Hi, Yi: Thanks for your reply. Do you mean there is no advantage of Avro message vs Protocol buffer message on Kafka except Avro schema registry? BTW, do you know how Kafka implement the Avro message? Does each Avro message include the schema or not? The size of Avro message is a big c

Re: Avro vs Protocol buffer for Samza output

2015-11-18 Thread Yi Pan
Hi, Selina, Samza's producer/consumer is highly tunable. You can configure it to use ProtocolBufferSerde class if your messages in Kafka are in ProtocolBuf format. The use of Avro in Kafka is LinkedIn's choice and does not necessarily fit others. For the sake of "why LinkedIn uses Avro", here is