Re: [DISCUSSION] adding the serializer api back to the new java producer

Philippe Laflamme Thu, 04 Dec 2014 04:53:21 -0800

Sorry for adding noise, but I think Jan has a very good point: applications
shouldn't be forced to create multiple producers simply to wire-in the
proper Serializer. It's an artificial restriction that wastes resources.


It's a common thing for us to create a single producer and slap different
"views" on top for each topic it writes to.

Furthermore, requiring that a producer specify both a K and a V type is
clumsy for topics that don't have a key. The signature would look like
KafkaProducer<Void, MyObject> where the Void type is unnecessary noise that
also pollutes other types like ProducerRecord.

The less opinions Kafka has about application-level concerns, the better.

Cheers,
Philippe


On Tue, Dec 2, 2014 at 9:50 PM, Jan Filipiak <jan.filip...@trivago.com>
wrote:

> Hello Everyone,
>
> I would very much appreciate if someone could provide me a real world
> examplewhere it is more convenient to implement the serializers instead of
> just making sure to provide bytearrays.
>
> The code we came up with explicitly avoids the serializer api. I think it
> is common understanding that if you want to transport data you need to have
> it as a bytearray.
>
> If at all I personally would like to have a serializer interface that
> takes the same types as the producer
>
> public interface Serializer<K,V> extends Configurable {
>     public byte[] serializeKey(K data);
>     public byte[] serializeValue(V data);
>     public void close();
> }
>
> this would avoid long serialize implementations with branches like
> "switch(topic)" or "if(isKey)". Further serializer per topic makes more
> sense in my opinion. It feels natural to have a one to one relationship
> from types to topics or at least only a few partition per type. But as we
> inherit the type from the producer we would have to create many producers.
> This would create additional unnecessary connections to the brokers. With
> the serializers we create a one type to all topics relationship and the
> only type that satisfies that is the bytearray or Object. Am I missing
> something here? As said in the beginning I would like to that usecase that
> really benefits from using the serializers. I think in theory they sound
> great but they cause real practical issues that may lead users to wrong
> decisions.
>
> -1 for putting the serializers back in.
>
> Looking forward to replies that can show me the benefit of serializes and
> especially how the
> Type => topic relationship can be handled nicely.
>
> Best
> Jan
>
>
>
>
> On 25.11.2014 02:58, Jun Rao wrote:
>
>> Hi, Everyone,
>>
>> I'd like to start a discussion on whether it makes sense to add the
>> serializer api back to the new java producer. Currently, the new java
>> producer takes a byte array for both the key and the value. While this api
>> is simple, it pushes the serialization logic into the application. This
>> makes it hard to reason about what type of data is being sent to Kafka and
>> also makes it hard to share an implementation of the serializer. For
>> example, to support Avro, the serialization logic could be quite involved
>> since it might need to register the Avro schema in some remote registry
>> and
>> maintain a schema cache locally, etc. Without a serialization api, it's
>> impossible to share such an implementation so that people can easily
>> reuse.
>> We sort of overlooked this implication during the initial discussion of
>> the
>> producer api.
>>
>> So, I'd like to propose an api change to the new producer by adding back
>> the serializer api similar to what we had in the old producer. Specially,
>> the proposed api changes are the following.
>>
>> First, we change KafkaProducer to take generic types K and V for the key
>> and the value, respectively.
>>
>> public class KafkaProducer<K,V> implements Producer<K,V> {
>>
>>      public Future<RecordMetadata> send(ProducerRecord<K,V> record,
>> Callback
>> callback);
>>
>>      public Future<RecordMetadata> send(ProducerRecord<K,V> record);
>> }
>>
>> Second, we add two new configs, one for the key serializer and another for
>> the value serializer. Both serializers will default to the byte array
>> implementation.
>>
>> public class ProducerConfig extends AbstractConfig {
>>
>>      .define(KEY_SERIALIZER_CLASS_CONFIG, Type.CLASS,
>> "org.apache.kafka.clients.producer.ByteArraySerializer", Importance.HIGH,
>> KEY_SERIALIZER_CLASS_DOC)
>>      .define(VALUE_SERIALIZER_CLASS_CONFIG, Type.CLASS,
>> "org.apache.kafka.clients.producer.ByteArraySerializer", Importance.HIGH,
>> VALUE_SERIALIZER_CLASS_DOC);
>> }
>>
>> Both serializers will implement the following interface.
>>
>> public interface Serializer<T> extends Configurable {
>>      public byte[] serialize(String topic, T data, boolean isKey);
>>
>>      public void close();
>> }
>>
>> This is more or less the same as what's in the old producer. The slight
>> differences are (1) the serializer now only requires a parameter-less
>> constructor; (2) the serializer has a configure() and a close() method for
>> initialization and cleanup, respectively; (3) the serialize() method
>> additionally takes the topic and an isKey indicator, both of which are
>> useful for things like schema registration.
>>
>> The detailed changes are included in KAFKA-1797. For completeness, I also
>> made the corresponding changes for the new java consumer api as well.
>>
>> Note that the proposed api changes are incompatible with what's in the
>> 0.8.2 branch. However, if those api changes are beneficial, it's probably
>> better to include them now in the 0.8.2 release, rather than later.
>>
>> I'd like to discuss mainly two things in this thread.
>> 1. Do people feel that the proposed api changes are reasonable?
>> 2. Are there any concerns of including the api changes in the 0.8.2 final
>> release?
>>
>> Thanks,
>>
>> Jun
>>
>>
>

Re: [DISCUSSION] adding the serializer api back to the new java producer

Reply via email to