Re: [DISCUSSION] adding the serializer api back to the new java producer

Jun Rao Tue, 02 Dec 2014 14:49:42 -0800

Rajiv,

Yes, that's possible within an organization. However, if you want to share
that implementation with other organizations, they will have to make code
changes, instead of just a config change.


Thanks,

Jun

On Tue, Dec 2, 2014 at 1:06 PM, Rajiv Kurian <ra...@signalfuse.com> wrote:

> Why can't the organization package the Avro implementation with a kafka
> client and distribute that library though? The risk of different users
> supplying the kafka client with different serializer/deserializer
> implementations still exists.
>
> On Tue, Dec 2, 2014 at 12:11 PM, Jun Rao <jun...@gmail.com> wrote:
>
> > Joel, Rajiv, Thunder,
> >
> > The issue with a separate ser/deser library is that if it's not part of
> the
> > client API, (1) users may not use it or (2) different users may use it in
> > different ways. For example, you can imagine that two Avro
> implementations
> > have different ways of instantiation (since it's not enforced by the
> client
> > API). This makes sharing such kind of libraries harder.
> >
> > Joel,
> >
> > As for reason about the data types, take an example of the consumer
> > application. It needs to deal with objects at some point. So the earlier
> > that type information is revealed, the clearer it is to the application.
> > Since the consumer client is the entry point where an application gets
> the
> > data,  if the type is enforced there, it makes it clear to all down
> stream
> > consumers.
> >
> > Thanks,
> >
> > Jun
> >
> > On Tue, Dec 2, 2014 at 10:06 AM, Joel Koshy <jjkosh...@gmail.com> wrote:
> >
> > > Re: pushing complexity of dealing with objects: we're talking about
> > > just a call to a serialize method to convert the object to a byte
> > > array right? Or is there more to it? (To me) that seems less
> > > cumbersome than having to interact with parameterized types. Actually,
> > > can you explain more clearly what you mean by <q>reason about what
> > > type of data is being sent</q> in your original email? I have some
> > > notion of what that means but it is a bit vague and you might have
> > > meant something else.
> > >
> > > Thanks,
> > >
> > > Joel
> > >
> > > On Tue, Dec 02, 2014 at 09:15:19AM -0800, Jun Rao wrote:
> > > > Joel,
> > > >
> > > > Thanks for the feedback.
> > > >
> > > > Yes, the raw bytes interface is simpler than the Generic api.
> However,
> > it
> > > > just pushes the complexity of dealing with the objects to the
> > > application.
> > > > We also thought about the layered approach. However, this may confuse
> > the
> > > > users since there is no single entry point and it's not clear which
> > > layer a
> > > > user should be using.
> > > >
> > > > Jun
> > > >
> > > >
> > > > On Tue, Dec 2, 2014 at 12:34 AM, Joel Koshy <jjkosh...@gmail.com>
> > wrote:
> > > >
> > > > > > makes it hard to reason about what type of data is being sent to
> > > Kafka
> > > > > and
> > > > > > also makes it hard to share an implementation of the serializer.
> > For
> > > > > > example, to support Avro, the serialization logic could be quite
> > > involved
> > > > > > since it might need to register the Avro schema in some remote
> > > registry
> > > > > and
> > > > > > maintain a schema cache locally, etc. Without a serialization
> api,
> > > it's
> > > > > > impossible to share such an implementation so that people can
> > easily
> > > > > reuse.
> > > > > > We sort of overlooked this implication during the initial
> > discussion
> > > of
> > > > > the
> > > > > > producer api.
> > > > >
> > > > > Thanks for bringing this up and the patch.  My take on this is that
> > > > > any reasoning about the data itself is more appropriately handled
> > > > > outside of the core producer API. FWIW, I don't think this was
> > > > > _overlooked_ during the initial discussion of the producer API
> > > > > (especially since it was a significant change from the old
> producer).
> > > > > IIRC we believed at the time that there is elegance and flexibility
> > in
> > > > > a simple API that deals with raw bytes. I think it is more accurate
> > to
> > > > > say that this is a reversal of opinion for some (which is fine) but
> > > > > personally I'm still in the old camp :) i.e., I really like the
> > > > > simplicity of the current 0.8.2 producer API and find parameterized
> > > > > types/generics to be distracting and annoying; and IMO any
> > > > > data-specific handling is better absorbed at a higher-level than
> the
> > > > > core Kafka APIs - possibly by a (very thin) wrapper producer
> library.
> > > > > I don't quite see why it is difficult to share different wrapper
> > > > > implementations; or even ser-de libraries for that matter that
> people
> > > > > can invoke before sending to/reading from Kafka.
> > > > >
> > > > > That said I'm not opposed to the change - it's just that I prefer
> > > > > what's currently there. So I'm +0 on the proposal.
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Joel
> > > > >
> > > > > On Mon, Nov 24, 2014 at 05:58:50PM -0800, Jun Rao wrote:
> > > > > > Hi, Everyone,
> > > > > >
> > > > > > I'd like to start a discussion on whether it makes sense to add
> the
> > > > > > serializer api back to the new java producer. Currently, the new
> > java
> > > > > > producer takes a byte array for both the key and the value. While
> > > this
> > > > > api
> > > > > > is simple, it pushes the serialization logic into the
> application.
> > > This
> > > > > > makes it hard to reason about what type of data is being sent to
> > > Kafka
> > > > > and
> > > > > > also makes it hard to share an implementation of the serializer.
> > For
> > > > > > example, to support Avro, the serialization logic could be quite
> > > involved
> > > > > > since it might need to register the Avro schema in some remote
> > > registry
> > > > > and
> > > > > > maintain a schema cache locally, etc. Without a serialization
> api,
> > > it's
> > > > > > impossible to share such an implementation so that people can
> > easily
> > > > > reuse.
> > > > > > We sort of overlooked this implication during the initial
> > discussion
> > > of
> > > > > the
> > > > > > producer api.
> > > > > >
> > > > > > So, I'd like to propose an api change to the new producer by
> adding
> > > back
> > > > > > the serializer api similar to what we had in the old producer.
> > > Specially,
> > > > > > the proposed api changes are the following.
> > > > > >
> > > > > > First, we change KafkaProducer to take generic types K and V for
> > the
> > > key
> > > > > > and the value, respectively.
> > > > > >
> > > > > > public class KafkaProducer<K,V> implements Producer<K,V> {
> > > > > >
> > > > > >     public Future<RecordMetadata> send(ProducerRecord<K,V>
> record,
> > > > > Callback
> > > > > > callback);
> > > > > >
> > > > > >     public Future<RecordMetadata> send(ProducerRecord<K,V>
> record);
> > > > > > }
> > > > > >
> > > > > > Second, we add two new configs, one for the key serializer and
> > > another
> > > > > for
> > > > > > the value serializer. Both serializers will default to the byte
> > array
> > > > > > implementation.
> > > > > >
> > > > > > public class ProducerConfig extends AbstractConfig {
> > > > > >
> > > > > >     .define(KEY_SERIALIZER_CLASS_CONFIG, Type.CLASS,
> > > > > > "org.apache.kafka.clients.producer.ByteArraySerializer",
> > > Importance.HIGH,
> > > > > > KEY_SERIALIZER_CLASS_DOC)
> > > > > >     .define(VALUE_SERIALIZER_CLASS_CONFIG, Type.CLASS,
> > > > > > "org.apache.kafka.clients.producer.ByteArraySerializer",
> > > Importance.HIGH,
> > > > > > VALUE_SERIALIZER_CLASS_DOC);
> > > > > > }
> > > > > >
> > > > > > Both serializers will implement the following interface.
> > > > > >
> > > > > > public interface Serializer<T> extends Configurable {
> > > > > >     public byte[] serialize(String topic, T data, boolean isKey);
> > > > > >
> > > > > >     public void close();
> > > > > > }
> > > > > >
> > > > > > This is more or less the same as what's in the old producer. The
> > > slight
> > > > > > differences are (1) the serializer now only requires a
> > parameter-less
> > > > > > constructor; (2) the serializer has a configure() and a close()
> > > method
> > > > > for
> > > > > > initialization and cleanup, respectively; (3) the serialize()
> > method
> > > > > > additionally takes the topic and an isKey indicator, both of
> which
> > > are
> > > > > > useful for things like schema registration.
> > > > > >
> > > > > > The detailed changes are included in KAFKA-1797. For
> completeness,
> > I
> > > also
> > > > > > made the corresponding changes for the new java consumer api as
> > well.
> > > > > >
> > > > > > Note that the proposed api changes are incompatible with what's
> in
> > > the
> > > > > > 0.8.2 branch. However, if those api changes are beneficial, it's
> > > probably
> > > > > > better to include them now in the 0.8.2 release, rather than
> later.
> > > > > >
> > > > > > I'd like to discuss mainly two things in this thread.
> > > > > > 1. Do people feel that the proposed api changes are reasonable?
> > > > > > 2. Are there any concerns of including the api changes in the
> 0.8.2
> > > final
> > > > > > release?
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Jun
> > > > >
> > > > >
> > >
> > >
> >
>

Re: [DISCUSSION] adding the serializer api back to the new java producer

Reply via email to