Daniel,


We have the same question.  We noticed that the compression tests we ran
using the built in performance tester was not realistic.  I think on disk
compression was 200:1.  (yes that is two hundred to one) I had planned to
try and edit the producer performance tester source and do the following



1.       Add an option to read sample data from provided text file.
(thought would be to add a file with 1-5000 rows, whatever I thought my
batch size might be)

2.      Load sample file into array

3.      Change code that creates message to pull a random row from array



I also am not a Scala developer  so would take me a little bit to figure
this out.  This is on hold right now as I am looking at options of
compression of the message before sending to kafka.  We had originally not
wanted to do this as we are assuming that we would not get efficient
compression ratios as we are only doing a single message however we are
also talking about sending multiple messages from our application as a
single Kafka message.  Our concern with using kafka compression is the
overhead required from decompression on the broker to assign Ids.  Here is
a good article that describes this
http://geekmantra.wordpress.com/2013/03/28/compression-in-kafka-gzip-or-snappy/



But again we haven’t decided just yet.  Would like to test and evaluate.



Bert


On Mon, Jun 30, 2014 at 2:24 AM, Daniel Compton <d...@danielcompton.net>
wrote:

> Hi folks
>
> I was doing some performance testing using the built in Kafka performance
> tester and it seems like it sends messages of size n bytes but with all
> bytes having the value 0x0. Is that correct? Reading the source seemed to
> indicate that too but I'm not a Scala developer so I could be wrong.
>
> Would this affect the performance compared to a real world scenario?
> Obviously you will get very efficient compression rates but apart from
> that, is there likely to be optimisations carried out  anywhere between the
> JVM and the network card that won't hold for messages with non zero entropy?
>
> We're going to test this against our production workload so it's not a big
> deal for us but I wondered if this could give others skewed results?
>
> ---
> Daniel

Reply via email to