You could do what you are asking with a custom encoder/decoder so the message bytes are made up of "messageType+AvroMessage" for the bytes of the message. The message can be whatever byte structure you want and not just the avro binary i.e. https://github.com/linkedin/camus/blob/master/camus-kafka-coders/src/main/java/com/linkedin/camus/etl/kafka/coders/KafkaAvroMessageEncoder.java / https://github.com/linkedin/camus/blob/master/camus-kafka-coders/src/main/java/com/linkedin/camus/etl/kafka/coders/KafkaAvroMessageDecoder.java or even write the messageType as the key https://github.com/apache/kafka/blob/0.8.1/core/src/main/scala/kafka/message/MessageAndMetadata.scala#L30 and read that in your consumer.
I am not sure exactly what you are trying to solve because if you have a 5MB message going over the wire and your consumer doesn't need that data then you just wasted transport costs in your infrastructure... in which case Jayesh's suggestion makes sense for the Kafka message to be a pointer to use to query in another system that supports reliable larger data/file storage (lots of options here) which is also a typical type of deployment in these scenarios. /******************************************* Joe Stein Founder, Principal Consultant Big Data Open Source Security LLC http://www.stealth.ly Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop> ********************************************/ On Fri, Jan 2, 2015 at 8:25 PM, Mukesh Jha <me.mukesh....@gmail.com> wrote: > Indeed my message size varies b/w ~500kb to ~5mb per avro. > > I am using kafka as a I need a scalable pub-sub messaging architecture with > multiple produces and consumers and guaranty of delivery. > Keeping data on filesystem or hdfs won't give me that. > > Also In the link below [1] there is a linkedin's performance benchmark of > kafka wrt message size which shows that Kafka's throughput increases with > messages of size ~100kb+. > > Agreed for kafka a record is key+value, I'm wondering if kafka can give us > a way to sneak peek a records metadata via its key. > > [1] > > https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines > On 3 Jan 2015 01:27, "Jayesh Thakrar" <j_thak...@yahoo.com.invalid> wrote: > > > Just wondering Mukesh - the reason you want this feature is because your > > value payload is not small (tens of kb). Don't know if that is the right > > usage of kafka. It might be worthwhile to store the avro files in a > > filesystem (regular, cluster fs, hdfs or even hbase) and the value in > your > > kafka message can be the reference or uri for the avro file. > > > > That way you make the best use of each system's features and strengths. > > > > Kafka does have api to get metadata - the topics, partitions and primary > > for partition etc. If we consider a key-value pair as a "record" than > what > > you are looking for is to get a part of the record (ie key only) and not > > the whole record - so i would still consider that a data query/api. > > > > >