You could do what you are asking with a custom encoder/decoder so the
message bytes are made up of "messageType+AvroMessage" for the bytes of the
message. The message can be whatever byte structure you want and not just
the avro binary i.e.
https://github.com/linkedin/camus/blob/master/camus-kafka-coders/src/main/java/com/linkedin/camus/etl/kafka/coders/KafkaAvroMessageEncoder.java
/
https://github.com/linkedin/camus/blob/master/camus-kafka-coders/src/main/java/com/linkedin/camus/etl/kafka/coders/KafkaAvroMessageDecoder.java
or even write the messageType as the key
https://github.com/apache/kafka/blob/0.8.1/core/src/main/scala/kafka/message/MessageAndMetadata.scala#L30
and read that in your consumer.

I am not sure exactly what you are trying to solve because if you have a
5MB message going over the wire and your consumer doesn't need that data
then you just wasted transport costs in your infrastructure... in which
case Jayesh's suggestion makes sense for the Kafka message to be a pointer
to use to query in another system that supports reliable larger data/file
storage (lots of options here) which is also a typical type of deployment
in these scenarios.

/*******************************************
 Joe Stein
 Founder, Principal Consultant
 Big Data Open Source Security LLC
 http://www.stealth.ly
 Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
********************************************/

On Fri, Jan 2, 2015 at 8:25 PM, Mukesh Jha <me.mukesh....@gmail.com> wrote:

> Indeed my message size varies b/w ~500kb to ~5mb per avro.
>
> I am using kafka as a I need a scalable pub-sub messaging architecture with
> multiple produces and consumers and guaranty of delivery.
> Keeping data on filesystem or hdfs won't give me that.
>
> Also In the link below [1] there is a linkedin's performance benchmark of
> kafka wrt message size which shows that Kafka's throughput increases with
> messages of size ~100kb+.
>
> Agreed for kafka a record is key+value, I'm wondering if kafka can give us
> a way to sneak peek a records metadata via its key.
>
> [1]
>
> https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines
> On 3 Jan 2015 01:27, "Jayesh Thakrar" <j_thak...@yahoo.com.invalid> wrote:
>
> > Just wondering Mukesh - the reason you want this feature is because your
> > value payload is not small (tens of kb). Don't know if that is the right
> > usage of kafka. It might be worthwhile to store the avro files in a
> > filesystem (regular, cluster fs, hdfs or even hbase) and the value in
> your
> > kafka message can be the reference or uri for the avro file.
> >
> > That way you make the best use of each system's features and strengths.
> >
> > Kafka does have api to get metadata - the topics, partitions and  primary
> > for partition etc. If we consider a key-value pair as a "record" than
> what
> > you are looking for is to get a part of the record (ie key only) and not
> > the whole record - so i would still consider that a data query/api.
> >
> >
>

Reply via email to