[ 
https://issues.apache.org/jira/browse/KAFKA-3744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15300075#comment-15300075
 ] 

David Kay commented on KAFKA-3744:
----------------------------------

Hi Ismael, thanks for the reply.

I can submit a KIP, but I don't believe this proposal meets the stated 
requirements for a KIP.  It does not change either the message format or the 
on-disk format in any manner that would affect current software.  The currently 
documented structure includes a one-byte "attributes" field that defines bits 
0-3 and reserves bits 4-7 for future use.  This proposal assigns meaning to 
bits 4-5 which were previously undefined, and leaves bits 6-7 reserved for 
future use.

All current producer, consumer, and messaging software would continue to run 
unchanged if this proposal were adopted.  Future producers and consumers could 
optionally use the two attribute bits, but the messaging software is unaffected 
by whether those bits remain undefined or are used for something.

Let me know if you think a KIP would still be helpful.

> Message format needs to identify serializer
> -------------------------------------------
>
>                 Key: KAFKA-3744
>                 URL: https://issues.apache.org/jira/browse/KAFKA-3744
>             Project: Kafka
>          Issue Type: Improvement
>            Reporter: David Kay
>            Priority: Minor
>
> https://issues.apache.org/jira/browse/KAFKA-3698 was recently resolved with 
> https://github.com/apache/kafka/commit/27a19b964af35390d78e1b3b50bc03d23327f4d0.
> But Kafka documentation on message formats needs to be more explicit for new 
> users. Section 1.3 Step 4 says: "Send some messages" and takes lines of text 
> from the command line. Beginner's guide 
> (http://www.slideshare.net/miguno/apache-kafka-08-basic-training-verisign 
> Slide 104 says:
> {noformat}
>    Kafka does not care about data format of msg payload
>    Up to developer to handle serialization/deserialization
>       Common choices: Avro, JSON
> {noformat}
> If one producer sends lines of console text, another producer sends Avro, a 
> third producer sends JSON, and a fourth sends CBOR, how does the consumer 
> identify which deserializer to use for the payload?  The commit includes an 
> opaque K byte Key that could potentially include a codec identifier, but 
> provides no guidance on how to use it:
> {quote}
> "Leaving the key and value opaque is the right decision: there is a great 
> deal of progress being made on serialization libraries right now, and any 
> particular choice is unlikely to be right for all uses. Needless to say a 
> particular application using Kafka would likely mandate a particular 
> serialization type as part of its usage."
> {quote}
> Mandating any particular serialization is as unrealistic as mandating a 
> single mime-type for all web content.  There must be a way to signal the 
> serialization used to produce this message's V byte payload, and documenting 
> the existence of even a rudimentary codec registry with a few values (text, 
> Avro, JSON, CBOR) would establish the pattern to be used for future 
> serialization libraries.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to