Re: [DISCUSSION] adding the serializer api back to the new java producer

Jan Filipiak Tue, 02 Dec 2014 18:51:33 -0800

Hello Everyone,

I would very much appreciate if someone could provide me a real worldexamplewhere it is more convenient to implement the serializers insteadof just making sure to provide bytearrays.

The code we came up with explicitly avoids the serializer api. I thinkit is common understanding that if you want to transport data you needto have it as a bytearray.

If at all I personally would like to have a serializer interface thattakes the same types as the producer


public interface Serializer<K,V> extends Configurable {
    public byte[] serializeKey(K data);
    public byte[] serializeValue(V data);
    public void close();
}

this would avoid long serialize implementations with branches like"switch(topic)" or "if(isKey)". Further serializer per topic makes moresense in my opinion. It feels natural to have a one to one relationshipfrom types to topics or at least only a few partition per type. But aswe inherit the type from the producer we would have to create manyproducers. This would create additional unnecessary connections to thebrokers. With the serializers we create a one type to all topicsrelationship and the only type that satisfies that is the bytearray orObject. Am I missing something here? As said in the beginning I wouldlike to that usecase that really benefits from using the serializers. Ithink in theory they sound great but they cause real practical issuesthat may lead users to wrong decisions.


-1 for putting the serializers back in.

Looking forward to replies that can show me the benefit of serializesand especially how the

Type => topic relationship can be handled nicely.

Best
Jan



On 25.11.2014 02:58, Jun Rao wrote:

Hi, Everyone,

I'd like to start a discussion on whether it makes sense to add the
serializer api back to the new java producer. Currently, the new java
producer takes a byte array for both the key and the value. While this api
is simple, it pushes the serialization logic into the application. This
makes it hard to reason about what type of data is being sent to Kafka and
also makes it hard to share an implementation of the serializer. For
example, to support Avro, the serialization logic could be quite involved
since it might need to register the Avro schema in some remote registry and
maintain a schema cache locally, etc. Without a serialization api, it's
impossible to share such an implementation so that people can easily reuse.
We sort of overlooked this implication during the initial discussion of the
producer api.

So, I'd like to propose an api change to the new producer by adding back
the serializer api similar to what we had in the old producer. Specially,
the proposed api changes are the following.

First, we change KafkaProducer to take generic types K and V for the key
and the value, respectively.

public class KafkaProducer<K,V> implements Producer<K,V> {

     public Future<RecordMetadata> send(ProducerRecord<K,V> record, Callback
callback);

     public Future<RecordMetadata> send(ProducerRecord<K,V> record);
}

Second, we add two new configs, one for the key serializer and another for
the value serializer. Both serializers will default to the byte array
implementation.

public class ProducerConfig extends AbstractConfig {

     .define(KEY_SERIALIZER_CLASS_CONFIG, Type.CLASS,
"org.apache.kafka.clients.producer.ByteArraySerializer", Importance.HIGH,
KEY_SERIALIZER_CLASS_DOC)
     .define(VALUE_SERIALIZER_CLASS_CONFIG, Type.CLASS,
"org.apache.kafka.clients.producer.ByteArraySerializer", Importance.HIGH,
VALUE_SERIALIZER_CLASS_DOC);
}

Both serializers will implement the following interface.

public interface Serializer<T> extends Configurable {
     public byte[] serialize(String topic, T data, boolean isKey);

     public void close();
}

This is more or less the same as what's in the old producer. The slight
differences are (1) the serializer now only requires a parameter-less
constructor; (2) the serializer has a configure() and a close() method for
initialization and cleanup, respectively; (3) the serialize() method
additionally takes the topic and an isKey indicator, both of which are
useful for things like schema registration.

The detailed changes are included in KAFKA-1797. For completeness, I also
made the corresponding changes for the new java consumer api as well.

Note that the proposed api changes are incompatible with what's in the
0.8.2 branch. However, if those api changes are beneficial, it's probably
better to include them now in the 0.8.2 release, rather than later.

I'd like to discuss mainly two things in this thread.
1. Do people feel that the proposed api changes are reasonable?
2. Are there any concerns of including the api changes in the 0.8.2 final
release?

Thanks,

Jun

Re: [DISCUSSION] adding the serializer api back to the new java producer

Reply via email to