question about compression

2014-07-21 Thread Bert Corderman
In trying to better understand compression I came across the following http://geekmantra.wordpress.com/2013/03/28/compression-in-kafka-gzip-or-snappy/ “in Kafka 0.8, messages for a partition are served by the leader broker. The leader assigns these unique logical offsets to every message it app

Re: Largest known Kafka deployment?

2014-07-07 Thread Bert Corderman
Thanks for the updated deck. I had not seen that one yet. I noticed in the preso you are running RAID10 in prod. Any thoughts of going JBOD? In our testing we saw significant performance improvements. This of course comes with trade off of manual steps if brokers fail. Bert On Monday, July 7

Re: Kafka producer performance test sending 0x0 byte messages

2014-07-02 Thread Bert Corderman
gt; > Daniel. > > > On 1/07/2014, at 2:07 am, Bert Corderman wrote: > > > > Daniel, > > > > > > > > We have the same question. We noticed that the compression tests we ran > > using the built in performance tester was not realistic. I think on d

Re: Kafka producer performance test sending 0x0 byte messages

2014-06-30 Thread Bert Corderman
Daniel, We have the same question. We noticed that the compression tests we ran using the built in performance tester was not realistic. I think on disk compression was 200:1. (yes that is two hundred to one) I had planned to try and edit the producer performance tester source and do the foll

Re: Question on message content, compression, multiple messages per kafka message?

2014-06-26 Thread Bert Corderman
verhead is concerned, have you tried running Snappy? > Snappy's performance is good enough to offset the decompression-compression > overhead on the server. > > Thanks, > Neha > > > On Thu, Jun 26, 2014 at 12:42 PM, Bert Corderman > wrote: > > > We are in th

Re: Experiences with larger message sizes

2014-06-26 Thread Bert Corderman
Thanks for the details Luke. At what point would you consider a message too big? Are you using compression? Bert On Thursday, June 26, 2014, Luke Forehand < luke.foreh...@networkedinsights.com> wrote: > I have used 50MB message size and it is not a great idea. First of all > you need to make

Question on message content, compression, multiple messages per kafka message?

2014-06-26 Thread Bert Corderman
We are in the process of engineering a system that will be using kafka. The legacy system is using the local file system and a database as the queue. In terms of scale we process about 35 billion events per day contained in 15 million files. I am looking for feedback on a design decision we ar

Re: Kafka Replication Behavior

2014-04-28 Thread Bert Corderman
Only a single broker needs to be online for data to be available. In your example partition 2 and 3 had copies of data on broker 0 and 1. When those two brokers went down your data was unavailable. To withstand two brokers going offline you would want to change your replication factor to 3. O

Re: performance testing data to share

2014-04-28 Thread Bert Corderman
support when using a single multi-threaded producer. Bert On Sun, Apr 27, 2014 at 11:09 PM, Jun Rao wrote: > Could you run the tests on the 0.8.1.1 release? > > Thanks, > > Jun > > > On Sat, Apr 26, 2014 at 8:23 PM, Bert Corderman > wrote: > > > version 0.8.0

Re: performance testing data to share

2014-04-26 Thread Bert Corderman
version 0.8.0 On Sat, Apr 26, 2014 at 12:03 AM, Jun Rao wrote: > Bert, > > Thanks for sharing. Which version of Kafka were you testing? > > > Jun > > > On Fri, Apr 25, 2014 at 3:11 PM, Bert Corderman > wrote: > > > I have been testing kafka for the past

performance testing data to share

2014-04-25 Thread Bert Corderman
I have been testing kafka for the past week or so and figured I would share my results so far. I am not sure if the formatting will keep in email but here are the results in a google doc...all 1,100 of them https://docs.google.com/spreadsheets/d/1UL-o2MiV0gHZtL4jFWNyqRTQl41LFdM0upjRIwCWNgQ/edit

Re: Kafka Performance Tuning

2014-04-24 Thread Bert Corderman
I had this error before and corrected by increasing nofile limit add to file an entry for the user running the broker. /etc/security/limits.conf kafka - nofile 98304 On Thu, Apr 24, 2014 at 1:46 PM, Yashika Gupta wrote: > Jun, > > The detailed logs are as follows: > > 24.04.2014 13:37:31812 I

Re: Cluster design distribution and JBOD vs RAID

2014-04-23 Thread Bert Corderman
hour. We run some > consumers in batch and flush on time delay. Other consumers are flush per > message processed. It's the flush per message that causes the high-volume. > > Push back on DEVs and software architecture if they want to flush per > message. Do it where it's on

Re: Unable to get off the ground following the "quick start" section

2014-04-17 Thread Bert Corderman
I cant speak to the quick start , however I found the following very helpful when I was getting started. http://www.michael-noll.com/blog/2013/03/13/running-a-multi-broker-apache-kafka-cluster-on-a-single-node/ On Thu, Apr 17, 2014 at 9:22 AM, Stephen Boesch wrote: > I have tried to use kafka

Re: Cluster design distribution and JBOD vs RAID

2014-04-17 Thread Bert Corderman
> topic. If you replicate 3 the you will end up with 3x active partitions per > broker. > > 1024 partitions per topic / 24 brokers =~ 43 leader partitions per broker > per topic. > BERT> Thanks for the example. Good to see others are using larger partition counts. > >

Cluster design distribution and JBOD vs RAID

2014-04-16 Thread Bert Corderman
I am wondering what others are doing in terms of cluster separation. (if at all) For example let’s say I need 24 nodes to support a given workload. What are the tradeoffs between a single 24 node cluster vs 2 x 12 node clusters for example. The application I support can support separation of data