Daniel,
We have the same question. We noticed that the compression tests we ran using the built in performance tester was not realistic. I think on disk compression was 200:1. (yes that is two hundred to one) I had planned to try and edit the producer performance tester source and do the following 1. Add an option to read sample data from provided text file. (thought would be to add a file with 1-5000 rows, whatever I thought my batch size might be) 2. Load sample file into array 3. Change code that creates message to pull a random row from array I also am not a Scala developer so would take me a little bit to figure this out. This is on hold right now as I am looking at options of compression of the message before sending to kafka. We had originally not wanted to do this as we are assuming that we would not get efficient compression ratios as we are only doing a single message however we are also talking about sending multiple messages from our application as a single Kafka message. Our concern with using kafka compression is the overhead required from decompression on the broker to assign Ids. Here is a good article that describes this http://geekmantra.wordpress.com/2013/03/28/compression-in-kafka-gzip-or-snappy/ But again we haven’t decided just yet. Would like to test and evaluate. Bert On Mon, Jun 30, 2014 at 2:24 AM, Daniel Compton <d...@danielcompton.net> wrote: > Hi folks > > I was doing some performance testing using the built in Kafka performance > tester and it seems like it sends messages of size n bytes but with all > bytes having the value 0x0. Is that correct? Reading the source seemed to > indicate that too but I'm not a Scala developer so I could be wrong. > > Would this affect the performance compared to a real world scenario? > Obviously you will get very efficient compression rates but apart from > that, is there likely to be optimisations carried out anywhere between the > JVM and the network card that won't hold for messages with non zero entropy? > > We're going to test this against our production workload so it's not a big > deal for us but I wondered if this could give others skewed results? > > --- > Daniel