[ https://issues.apache.org/jira/browse/KAFKA-3744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15297898#comment-15297898 ]
Ismael Juma commented on KAFKA-3744: ------------------------------------ Hi [~davek22]. A change to the message format would require a KIP: https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals You may also choose to email the mailing list before doing the KIP to get feedback from a wider group. There are other ways of achieving something like this (eg https://github.com/confluentinc/schema-registry) with different trade-offs. > Message format needs to identify serializer > ------------------------------------------- > > Key: KAFKA-3744 > URL: https://issues.apache.org/jira/browse/KAFKA-3744 > Project: Kafka > Issue Type: Improvement > Reporter: David Kay > Priority: Minor > > https://issues.apache.org/jira/browse/KAFKA-3698 was recently resolved with > https://github.com/apache/kafka/commit/27a19b964af35390d78e1b3b50bc03d23327f4d0. > But Kafka documentation on message formats needs to be more explicit for new > users. Section 1.3 Step 4 says: "Send some messages" and takes lines of text > from the command line. Beginner's guide > (http://www.slideshare.net/miguno/apache-kafka-08-basic-training-verisign > Slide 104 says: > {noformat} > Kafka does not care about data format of msg payload > Up to developer to handle serialization/deserialization > Common choices: Avro, JSON > {noformat} > If one producer sends lines of console text, another producer sends Avro, a > third producer sends JSON, and a fourth sends CBOR, how does the consumer > identify which deserializer to use for the payload? The commit includes an > opaque K byte Key that could potentially include a codec identifier, but > provides no guidance on how to use it: > {quote} > "Leaving the key and value opaque is the right decision: there is a great > deal of progress being made on serialization libraries right now, and any > particular choice is unlikely to be right for all uses. Needless to say a > particular application using Kafka would likely mandate a particular > serialization type as part of its usage." > {quote} > Mandating any particular serialization is as unrealistic as mandating a > single mime-type for all web content. There must be a way to signal the > serialization used to produce this message's V byte payload, and documenting > the existence of even a rudimentary codec registry with a few values (text, > Avro, JSON, CBOR) would establish the pattern to be used for future > serialization libraries. -- This message was sent by Atlassian JIRA (v6.3.4#6332)