You can use a thread pool to write to hbase. And create another pool of consumer threads. Or add more consumer processes. The bottleneck is writing to Hbase in this case. Regards,
Libo -----Original Message----- From: Graeme Wallace [mailto:graeme.wall...@farecompare.com] Sent: Wednesday, October 02, 2013 4:36 PM To: users Subject: Re: Strategies for improving Consumer throughput Yes, definitely consumers are behind - we can see from examining the offsets On Wed, Oct 2, 2013 at 1:59 PM, Joe Stein <crypt...@gmail.com> wrote: > Are you sure the consumers are behind? could the pause be because the > stream is empty and producing messages is what is behind the consumption? > > What if you shut off your consumers for 5 minutes and then start them > again do the consumers behave the same way? > > /******************************************* > Joe Stein > Founder, Principal Consultant > Big Data Open Source Security LLC > http://www.stealth.ly > Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop> > ********************************************/ > > > On Wed, Oct 2, 2013 at 3:54 PM, Graeme Wallace < > graeme.wall...@farecompare.com> wrote: > > > Hi All, > > > > We've got processes that produce many millions of itineraries per minute. > > We would like to get them into HBase (so we can query for chunks of > > them > > later) - so our idea was to write each itinerary as a message into > > Kafka > - > > so that not only can we have consumers that write to HBase, but also > other > > consumers that may provide some sort of real-time monitoring service > > and also an archive service. > > > > Problem is - we don't really know enough about how best to do this > > effectively with Kafka, so that the producers can run flat out and > > the consumers can run flat out too. We've tried having one topic, > > with > multiple > > partitions to match the spindles on our broker h/w (12 on each) - > > and setting up a thread per partition on the consumer side. > > > > At the moment, our particular problem is that the consumers just > > can't > keep > > up. We can see from logging that the consumer threads seem to run in > > bursts, then a pause (as yet we don't know what the pause is - dont > > think its GC). Anyways, does what we are doing with one topic and > > multiple partitions sound correct ? Or do we need to change ? Any > > tricks to speed > up > > consumption ? (we've tried changing the fetch size - doesnt help much). > Am > > i correct in assuming we can have one thread per partition for > consumption > > ? > > > > Thanks in advance, > > > > Graeme > > > > -- > > Graeme Wallace > > CTO > > FareCompare.com > > O: 972 588 1414 > > M: 214 681 9018 > > > -- Graeme Wallace CTO FareCompare.com O: 972 588 1414 M: 214 681 9018