Looked at the patch. +1 from me. On 11/24/14 8:29 PM, "Gwen Shapira" <gshap...@cloudera.com> wrote:
>As one of the people who spent too much time building Avro repositories, >+1 >on bringing serializer API back. > >I think it will make the new producer easier to work with. > >Gwen > >On Mon, Nov 24, 2014 at 6:13 PM, Jay Kreps <jay.kr...@gmail.com> wrote: > >> This is admittedly late in the release cycle to make a change. To add to >> Jun's description the motivation was that we felt it would be better to >> change that interface now rather than after the release if it needed to >> change. >> >> The motivation for wanting to make a change was the ability to really be >> able to develop support for Avro and other serialization formats. The >> current status is pretty scattered--there is a schema repository on an >>Avro >> JIRA and another fork of that on github, and a bunch of people we have >> talked to have done similar things for other serialization systems. It >> would be nice if these things could be packaged in such a way that it >>was >> possible to just change a few configs in the producer and get rich >>metadata >> support for messages. >> >> As we were thinking this through we realized that the new api we were >>about >> to introduce was kind of not very compatable with this since it was just >> byte[] oriented. >> >> You can always do this by adding some kind of wrapper api that wraps the >> producer. But this puts us back in the position of trying to document >>and >> support multiple interfaces. >> >> This also opens up the possibility of adding a MessageValidator or >> MessageInterceptor plug-in transparently so that you can do other custom >> validation on the messages you are sending which obviously requires >>access >> to the original object not the byte array. >> >> This api doesn't prevent using byte[] by configuring the >> ByteArraySerializer it works as it currently does. >> >> -Jay >> >> On Mon, Nov 24, 2014 at 5:58 PM, Jun Rao <jun...@gmail.com> wrote: >> >> > Hi, Everyone, >> > >> > I'd like to start a discussion on whether it makes sense to add the >> > serializer api back to the new java producer. Currently, the new java >> > producer takes a byte array for both the key and the value. While this >> api >> > is simple, it pushes the serialization logic into the application. >>This >> > makes it hard to reason about what type of data is being sent to Kafka >> and >> > also makes it hard to share an implementation of the serializer. For >> > example, to support Avro, the serialization logic could be quite >>involved >> > since it might need to register the Avro schema in some remote >>registry >> and >> > maintain a schema cache locally, etc. Without a serialization api, >>it's >> > impossible to share such an implementation so that people can easily >> reuse. >> > We sort of overlooked this implication during the initial discussion >>of >> the >> > producer api. >> > >> > So, I'd like to propose an api change to the new producer by adding >>back >> > the serializer api similar to what we had in the old producer. >>Specially, >> > the proposed api changes are the following. >> > >> > First, we change KafkaProducer to take generic types K and V for the >>key >> > and the value, respectively. >> > >> > public class KafkaProducer<K,V> implements Producer<K,V> { >> > >> > public Future<RecordMetadata> send(ProducerRecord<K,V> record, >> Callback >> > callback); >> > >> > public Future<RecordMetadata> send(ProducerRecord<K,V> record); >> > } >> > >> > Second, we add two new configs, one for the key serializer and another >> for >> > the value serializer. Both serializers will default to the byte array >> > implementation. >> > >> > public class ProducerConfig extends AbstractConfig { >> > >> > .define(KEY_SERIALIZER_CLASS_CONFIG, Type.CLASS, >> > "org.apache.kafka.clients.producer.ByteArraySerializer", >>Importance.HIGH, >> > KEY_SERIALIZER_CLASS_DOC) >> > .define(VALUE_SERIALIZER_CLASS_CONFIG, Type.CLASS, >> > "org.apache.kafka.clients.producer.ByteArraySerializer", >>Importance.HIGH, >> > VALUE_SERIALIZER_CLASS_DOC); >> > } >> > >> > Both serializers will implement the following interface. >> > >> > public interface Serializer<T> extends Configurable { >> > public byte[] serialize(String topic, T data, boolean isKey); >> > >> > public void close(); >> > } >> > >> > This is more or less the same as what's in the old producer. The >>slight >> > differences are (1) the serializer now only requires a parameter-less >> > constructor; (2) the serializer has a configure() and a close() method >> for >> > initialization and cleanup, respectively; (3) the serialize() method >> > additionally takes the topic and an isKey indicator, both of which are >> > useful for things like schema registration. >> > >> > The detailed changes are included in KAFKA-1797. For completeness, I >>also >> > made the corresponding changes for the new java consumer api as well. >> > >> > Note that the proposed api changes are incompatible with what's in the >> > 0.8.2 branch. However, if those api changes are beneficial, it's >>probably >> > better to include them now in the 0.8.2 release, rather than later. >> > >> > I'd like to discuss mainly two things in this thread. >> > 1. Do people feel that the proposed api changes are reasonable? >> > 2. Are there any concerns of including the api changes in the 0.8.2 >>final >> > release? >> > >> > Thanks, >> > >> > Jun >> > >>