Re: [DISCUSSION] adding the serializer api back to the new java producer

Jay Kreps Tue, 25 Nov 2014 09:21:47 -0800

Hey Shlomi,

I agree that we just blew this one from a timing perspective. We ideally
should have thought this through in the original api discussion. But as we
really started to think about this area we realized that the existing api
made it really hard to provide a simple way package of serialization and
data model stuff. I have heard that there is a saying that the best time to
plant a tree is a generation ago, but the second best time is right now.
And I think this is kind of along those lines.


This was really brought home to me as for the last month or so I have been
going around and talking to a lot of people using Kafka and essentially
every one of them has had to make some kind of wrapper api. There is
nothing so terrible about these wrappers except that they make it hard to
have central documentation that explains how the system works, and they
usually strip off a lot of the functionality of the client, so you always
have to learn the in-house wrapper and can't really do everything you could
do with the main client. Since all the wrappers were trying to provide a
few things: serialization, message validation, etc. All of these depend on
having access to the original object. I think if we make this change on
serialization we can later add any additional hooks for message validation
with no compatibility problems.

-Jay

On Tue, Nov 25, 2014 at 12:12 AM, Shlomi Hazan <shl...@viber.com> wrote:

> Jun, while just a humble user, I would like to recall that it was just 6
> days ago that you told me on the user list that the producer is stable when
> I asked what producer to go with and if the new producer is production
> stable (you can still see that email down the list).
> maybe I miss something, but for me, stable includes the API.
> So it looks rather too big and too late from where I am standing to make
> this change now. this kind of change will introduce generics, add major
> mandatory interface, and make the whole producer more complicated then it
> really has to be when you consider only Kafka and not Avro.
> I can see the obvious benefits for the many other use cases, but once you
> declare something stable it is usually expected that the API will not
> change unless something really big was discovered.
> Now it may be the case that you discovered something big enough and so
> personally I will not make a vote.
> If the benefits make the change justifiable is for you guys to decide.
> Shlomi
>
> On Tue, Nov 25, 2014 at 6:43 AM, Sriram Subramanian <
> srsubraman...@linkedin.com.invalid> wrote:
>
> > Looked at the patch. +1 from me.
> >
> > On 11/24/14 8:29 PM, "Gwen Shapira" <gshap...@cloudera.com> wrote:
> >
> > >As one of the people who spent too much time building Avro repositories,
> > >+1
> > >on bringing serializer API back.
> > >
> > >I think it will make the new producer easier to work with.
> > >
> > >Gwen
> > >
> > >On Mon, Nov 24, 2014 at 6:13 PM, Jay Kreps <jay.kr...@gmail.com> wrote:
> > >
> > >> This is admittedly late in the release cycle to make a change. To add
> to
> > >> Jun's description the motivation was that we felt it would be better
> to
> > >> change that interface now rather than after the release if it needed
> to
> > >> change.
> > >>
> > >> The motivation for wanting to make a change was the ability to really
> be
> > >> able to develop support for Avro and other serialization formats. The
> > >> current status is pretty scattered--there is a schema repository on an
> > >>Avro
> > >> JIRA and another fork of that on github, and a bunch of people we have
> > >> talked to have done similar things for other serialization systems. It
> > >> would be nice if these things could be packaged in such a way that it
> > >>was
> > >> possible to just change a few configs in the producer and get rich
> > >>metadata
> > >> support for messages.
> > >>
> > >> As we were thinking this through we realized that the new api we were
> > >>about
> > >> to introduce was kind of not very compatable with this since it was
> just
> > >> byte[] oriented.
> > >>
> > >> You can always do this by adding some kind of wrapper api that wraps
> the
> > >> producer. But this puts us back in the position of trying to document
> > >>and
> > >> support multiple interfaces.
> > >>
> > >> This also opens up the possibility of adding a MessageValidator or
> > >> MessageInterceptor plug-in transparently so that you can do other
> custom
> > >> validation on the messages you are sending which obviously requires
> > >>access
> > >> to the original object not the byte array.
> > >>
> > >> This api doesn't prevent using byte[] by configuring the
> > >> ByteArraySerializer it works as it currently does.
> > >>
> > >> -Jay
> > >>
> > >> On Mon, Nov 24, 2014 at 5:58 PM, Jun Rao <jun...@gmail.com> wrote:
> > >>
> > >> > Hi, Everyone,
> > >> >
> > >> > I'd like to start a discussion on whether it makes sense to add the
> > >> > serializer api back to the new java producer. Currently, the new
> java
> > >> > producer takes a byte array for both the key and the value. While
> this
> > >> api
> > >> > is simple, it pushes the serialization logic into the application.
> > >>This
> > >> > makes it hard to reason about what type of data is being sent to
> Kafka
> > >> and
> > >> > also makes it hard to share an implementation of the serializer. For
> > >> > example, to support Avro, the serialization logic could be quite
> > >>involved
> > >> > since it might need to register the Avro schema in some remote
> > >>registry
> > >> and
> > >> > maintain a schema cache locally, etc. Without a serialization api,
> > >>it's
> > >> > impossible to share such an implementation so that people can easily
> > >> reuse.
> > >> > We sort of overlooked this implication during the initial discussion
> > >>of
> > >> the
> > >> > producer api.
> > >> >
> > >> > So, I'd like to propose an api change to the new producer by adding
> > >>back
> > >> > the serializer api similar to what we had in the old producer.
> > >>Specially,
> > >> > the proposed api changes are the following.
> > >> >
> > >> > First, we change KafkaProducer to take generic types K and V for the
> > >>key
> > >> > and the value, respectively.
> > >> >
> > >> > public class KafkaProducer<K,V> implements Producer<K,V> {
> > >> >
> > >> >     public Future<RecordMetadata> send(ProducerRecord<K,V> record,
> > >> Callback
> > >> > callback);
> > >> >
> > >> >     public Future<RecordMetadata> send(ProducerRecord<K,V> record);
> > >> > }
> > >> >
> > >> > Second, we add two new configs, one for the key serializer and
> another
> > >> for
> > >> > the value serializer. Both serializers will default to the byte
> array
> > >> > implementation.
> > >> >
> > >> > public class ProducerConfig extends AbstractConfig {
> > >> >
> > >> >     .define(KEY_SERIALIZER_CLASS_CONFIG, Type.CLASS,
> > >> > "org.apache.kafka.clients.producer.ByteArraySerializer",
> > >>Importance.HIGH,
> > >> > KEY_SERIALIZER_CLASS_DOC)
> > >> >     .define(VALUE_SERIALIZER_CLASS_CONFIG, Type.CLASS,
> > >> > "org.apache.kafka.clients.producer.ByteArraySerializer",
> > >>Importance.HIGH,
> > >> > VALUE_SERIALIZER_CLASS_DOC);
> > >> > }
> > >> >
> > >> > Both serializers will implement the following interface.
> > >> >
> > >> > public interface Serializer<T> extends Configurable {
> > >> >     public byte[] serialize(String topic, T data, boolean isKey);
> > >> >
> > >> >     public void close();
> > >> > }
> > >> >
> > >> > This is more or less the same as what's in the old producer. The
> > >>slight
> > >> > differences are (1) the serializer now only requires a
> parameter-less
> > >> > constructor; (2) the serializer has a configure() and a close()
> method
> > >> for
> > >> > initialization and cleanup, respectively; (3) the serialize() method
> > >> > additionally takes the topic and an isKey indicator, both of which
> are
> > >> > useful for things like schema registration.
> > >> >
> > >> > The detailed changes are included in KAFKA-1797. For completeness, I
> > >>also
> > >> > made the corresponding changes for the new java consumer api as
> well.
> > >> >
> > >> > Note that the proposed api changes are incompatible with what's in
> the
> > >> > 0.8.2 branch. However, if those api changes are beneficial, it's
> > >>probably
> > >> > better to include them now in the 0.8.2 release, rather than later.
> > >> >
> > >> > I'd like to discuss mainly two things in this thread.
> > >> > 1. Do people feel that the proposed api changes are reasonable?
> > >> > 2. Are there any concerns of including the api changes in the 0.8.2
> > >>final
> > >> > release?
> > >> >
> > >> > Thanks,
> > >> >
> > >> > Jun
> > >> >
> > >>
> >
> >
>

Re: [DISCUSSION] adding the serializer api back to the new java producer

Reply via email to