Sorry for adding noise, but I think Jan has a very good point: applications shouldn't be forced to create multiple producers simply to wire-in the proper Serializer. It's an artificial restriction that wastes resources.
It's a common thing for us to create a single producer and slap different "views" on top for each topic it writes to. Furthermore, requiring that a producer specify both a K and a V type is clumsy for topics that don't have a key. The signature would look like KafkaProducer<Void, MyObject> where the Void type is unnecessary noise that also pollutes other types like ProducerRecord. The less opinions Kafka has about application-level concerns, the better. Cheers, Philippe On Tue, Dec 2, 2014 at 9:50 PM, Jan Filipiak <jan.filip...@trivago.com> wrote: > Hello Everyone, > > I would very much appreciate if someone could provide me a real world > examplewhere it is more convenient to implement the serializers instead of > just making sure to provide bytearrays. > > The code we came up with explicitly avoids the serializer api. I think it > is common understanding that if you want to transport data you need to have > it as a bytearray. > > If at all I personally would like to have a serializer interface that > takes the same types as the producer > > public interface Serializer<K,V> extends Configurable { > public byte[] serializeKey(K data); > public byte[] serializeValue(V data); > public void close(); > } > > this would avoid long serialize implementations with branches like > "switch(topic)" or "if(isKey)". Further serializer per topic makes more > sense in my opinion. It feels natural to have a one to one relationship > from types to topics or at least only a few partition per type. But as we > inherit the type from the producer we would have to create many producers. > This would create additional unnecessary connections to the brokers. With > the serializers we create a one type to all topics relationship and the > only type that satisfies that is the bytearray or Object. Am I missing > something here? As said in the beginning I would like to that usecase that > really benefits from using the serializers. I think in theory they sound > great but they cause real practical issues that may lead users to wrong > decisions. > > -1 for putting the serializers back in. > > Looking forward to replies that can show me the benefit of serializes and > especially how the > Type => topic relationship can be handled nicely. > > Best > Jan > > > > > On 25.11.2014 02:58, Jun Rao wrote: > >> Hi, Everyone, >> >> I'd like to start a discussion on whether it makes sense to add the >> serializer api back to the new java producer. Currently, the new java >> producer takes a byte array for both the key and the value. While this api >> is simple, it pushes the serialization logic into the application. This >> makes it hard to reason about what type of data is being sent to Kafka and >> also makes it hard to share an implementation of the serializer. For >> example, to support Avro, the serialization logic could be quite involved >> since it might need to register the Avro schema in some remote registry >> and >> maintain a schema cache locally, etc. Without a serialization api, it's >> impossible to share such an implementation so that people can easily >> reuse. >> We sort of overlooked this implication during the initial discussion of >> the >> producer api. >> >> So, I'd like to propose an api change to the new producer by adding back >> the serializer api similar to what we had in the old producer. Specially, >> the proposed api changes are the following. >> >> First, we change KafkaProducer to take generic types K and V for the key >> and the value, respectively. >> >> public class KafkaProducer<K,V> implements Producer<K,V> { >> >> public Future<RecordMetadata> send(ProducerRecord<K,V> record, >> Callback >> callback); >> >> public Future<RecordMetadata> send(ProducerRecord<K,V> record); >> } >> >> Second, we add two new configs, one for the key serializer and another for >> the value serializer. Both serializers will default to the byte array >> implementation. >> >> public class ProducerConfig extends AbstractConfig { >> >> .define(KEY_SERIALIZER_CLASS_CONFIG, Type.CLASS, >> "org.apache.kafka.clients.producer.ByteArraySerializer", Importance.HIGH, >> KEY_SERIALIZER_CLASS_DOC) >> .define(VALUE_SERIALIZER_CLASS_CONFIG, Type.CLASS, >> "org.apache.kafka.clients.producer.ByteArraySerializer", Importance.HIGH, >> VALUE_SERIALIZER_CLASS_DOC); >> } >> >> Both serializers will implement the following interface. >> >> public interface Serializer<T> extends Configurable { >> public byte[] serialize(String topic, T data, boolean isKey); >> >> public void close(); >> } >> >> This is more or less the same as what's in the old producer. The slight >> differences are (1) the serializer now only requires a parameter-less >> constructor; (2) the serializer has a configure() and a close() method for >> initialization and cleanup, respectively; (3) the serialize() method >> additionally takes the topic and an isKey indicator, both of which are >> useful for things like schema registration. >> >> The detailed changes are included in KAFKA-1797. For completeness, I also >> made the corresponding changes for the new java consumer api as well. >> >> Note that the proposed api changes are incompatible with what's in the >> 0.8.2 branch. However, if those api changes are beneficial, it's probably >> better to include them now in the 0.8.2 release, rather than later. >> >> I'd like to discuss mainly two things in this thread. >> 1. Do people feel that the proposed api changes are reasonable? >> 2. Are there any concerns of including the api changes in the 0.8.2 final >> release? >> >> Thanks, >> >> Jun >> >> >