Re: [DISCUSSION] adding the serializer api back to the new java producer

Sriram Subramanian Mon, 24 Nov 2014 20:44:54 -0800

Looked at the patch. +1 from me.

On 11/24/14 8:29 PM, "Gwen Shapira" <gshap...@cloudera.com> wrote:


>As one of the people who spent too much time building Avro repositories,
>+1
>on bringing serializer API back.
>
>I think it will make the new producer easier to work with.
>
>Gwen
>
>On Mon, Nov 24, 2014 at 6:13 PM, Jay Kreps <jay.kr...@gmail.com> wrote:
>
>> This is admittedly late in the release cycle to make a change. To add to
>> Jun's description the motivation was that we felt it would be better to
>> change that interface now rather than after the release if it needed to
>> change.
>>
>> The motivation for wanting to make a change was the ability to really be
>> able to develop support for Avro and other serialization formats. The
>> current status is pretty scattered--there is a schema repository on an
>>Avro
>> JIRA and another fork of that on github, and a bunch of people we have
>> talked to have done similar things for other serialization systems. It
>> would be nice if these things could be packaged in such a way that it
>>was
>> possible to just change a few configs in the producer and get rich
>>metadata
>> support for messages.
>>
>> As we were thinking this through we realized that the new api we were
>>about
>> to introduce was kind of not very compatable with this since it was just
>> byte[] oriented.
>>
>> You can always do this by adding some kind of wrapper api that wraps the
>> producer. But this puts us back in the position of trying to document
>>and
>> support multiple interfaces.
>>
>> This also opens up the possibility of adding a MessageValidator or
>> MessageInterceptor plug-in transparently so that you can do other custom
>> validation on the messages you are sending which obviously requires
>>access
>> to the original object not the byte array.
>>
>> This api doesn't prevent using byte[] by configuring the
>> ByteArraySerializer it works as it currently does.
>>
>> -Jay
>>
>> On Mon, Nov 24, 2014 at 5:58 PM, Jun Rao <jun...@gmail.com> wrote:
>>
>> > Hi, Everyone,
>> >
>> > I'd like to start a discussion on whether it makes sense to add the
>> > serializer api back to the new java producer. Currently, the new java
>> > producer takes a byte array for both the key and the value. While this
>> api
>> > is simple, it pushes the serialization logic into the application.
>>This
>> > makes it hard to reason about what type of data is being sent to Kafka
>> and
>> > also makes it hard to share an implementation of the serializer. For
>> > example, to support Avro, the serialization logic could be quite
>>involved
>> > since it might need to register the Avro schema in some remote
>>registry
>> and
>> > maintain a schema cache locally, etc. Without a serialization api,
>>it's
>> > impossible to share such an implementation so that people can easily
>> reuse.
>> > We sort of overlooked this implication during the initial discussion
>>of
>> the
>> > producer api.
>> >
>> > So, I'd like to propose an api change to the new producer by adding
>>back
>> > the serializer api similar to what we had in the old producer.
>>Specially,
>> > the proposed api changes are the following.
>> >
>> > First, we change KafkaProducer to take generic types K and V for the
>>key
>> > and the value, respectively.
>> >
>> > public class KafkaProducer<K,V> implements Producer<K,V> {
>> >
>> >     public Future<RecordMetadata> send(ProducerRecord<K,V> record,
>> Callback
>> > callback);
>> >
>> >     public Future<RecordMetadata> send(ProducerRecord<K,V> record);
>> > }
>> >
>> > Second, we add two new configs, one for the key serializer and another
>> for
>> > the value serializer. Both serializers will default to the byte array
>> > implementation.
>> >
>> > public class ProducerConfig extends AbstractConfig {
>> >
>> >     .define(KEY_SERIALIZER_CLASS_CONFIG, Type.CLASS,
>> > "org.apache.kafka.clients.producer.ByteArraySerializer",
>>Importance.HIGH,
>> > KEY_SERIALIZER_CLASS_DOC)
>> >     .define(VALUE_SERIALIZER_CLASS_CONFIG, Type.CLASS,
>> > "org.apache.kafka.clients.producer.ByteArraySerializer",
>>Importance.HIGH,
>> > VALUE_SERIALIZER_CLASS_DOC);
>> > }
>> >
>> > Both serializers will implement the following interface.
>> >
>> > public interface Serializer<T> extends Configurable {
>> >     public byte[] serialize(String topic, T data, boolean isKey);
>> >
>> >     public void close();
>> > }
>> >
>> > This is more or less the same as what's in the old producer. The
>>slight
>> > differences are (1) the serializer now only requires a parameter-less
>> > constructor; (2) the serializer has a configure() and a close() method
>> for
>> > initialization and cleanup, respectively; (3) the serialize() method
>> > additionally takes the topic and an isKey indicator, both of which are
>> > useful for things like schema registration.
>> >
>> > The detailed changes are included in KAFKA-1797. For completeness, I
>>also
>> > made the corresponding changes for the new java consumer api as well.
>> >
>> > Note that the proposed api changes are incompatible with what's in the
>> > 0.8.2 branch. However, if those api changes are beneficial, it's
>>probably
>> > better to include them now in the 0.8.2 release, rather than later.
>> >
>> > I'd like to discuss mainly two things in this thread.
>> > 1. Do people feel that the proposed api changes are reasonable?
>> > 2. Are there any concerns of including the api changes in the 0.8.2
>>final
>> > release?
>> >
>> > Thanks,
>> >
>> > Jun
>> >
>>

Re: [DISCUSSION] adding the serializer api back to the new java producer

Reply via email to