Jun, let me see if I can fix first and then will submit back.
Daniel, I was looking at the code some more and was thinking this might work https://github.com/apache/kafka/blob/0.8.1/perf/src/main/scala/kafka/perf/ProducerPerformance.scala on line 246 instead of looping to create messages I could open a sample file and add the rows as messages lien by line until I hit configured message cap. If I hit end of file I would start at the top. I think I can figure this out. Bert On Mon, Jun 30, 2014 at 1:46 PM, Daniel Compton <d...@danielcompton.net> wrote: > Hi Bert > > What you are describing could be done partially with the console producer. > It will read from a file and send each line to the Kafka broker. You could > make a really big file or alter that code to repeat a certain number of > times. The source is pretty readable, I think that might be an easier route > to take. > > Daniel. > > > On 1/07/2014, at 2:07 am, Bert Corderman <bertc...@gmail.com> wrote: > > > > Daniel, > > > > > > > > We have the same question. We noticed that the compression tests we ran > > using the built in performance tester was not realistic. I think on disk > > compression was 200:1. (yes that is two hundred to one) I had planned to > > try and edit the producer performance tester source and do the following > > > > > > > > 1. Add an option to read sample data from provided text file. > > (thought would be to add a file with 1-5000 rows, whatever I thought my > > batch size might be) > > > > 2. Load sample file into array > > > > 3. Change code that creates message to pull a random row from array > > > > > > > > I also am not a Scala developer so would take me a little bit to figure > > this out. This is on hold right now as I am looking at options of > > compression of the message before sending to kafka. We had originally > not > > wanted to do this as we are assuming that we would not get efficient > > compression ratios as we are only doing a single message however we are > > also talking about sending multiple messages from our application as a > > single Kafka message. Our concern with using kafka compression is the > > overhead required from decompression on the broker to assign Ids. Here > is > > a good article that describes this > > > http://geekmantra.wordpress.com/2013/03/28/compression-in-kafka-gzip-or-snappy/ > > > > > > > > But again we haven’t decided just yet. Would like to test and evaluate. > > > > > > > > Bert > > > > > > On Mon, Jun 30, 2014 at 2:24 AM, Daniel Compton <d...@danielcompton.net> > > wrote: > > > >> Hi folks > >> > >> I was doing some performance testing using the built in Kafka > performance > >> tester and it seems like it sends messages of size n bytes but with all > >> bytes having the value 0x0. Is that correct? Reading the source seemed > to > >> indicate that too but I'm not a Scala developer so I could be wrong. > >> > >> Would this affect the performance compared to a real world scenario? > >> Obviously you will get very efficient compression rates but apart from > >> that, is there likely to be optimisations carried out anywhere between > the > >> JVM and the network card that won't hold for messages with non zero > entropy? > >> > >> We're going to test this against our production workload so it's not a > big > >> deal for us but I wondered if this could give others skewed results? > >> > >> --- > >> Daniel >