Ok. - so we figured out what the problem was with the consumers lagging behind.
We were pushing 800Mbits/sec+ to the consumer interface - so the 1Gb network interface was maxed out. Graeme On Wed, Oct 2, 2013 at 2:35 PM, Graeme Wallace < graeme.wall...@farecompare.com> wrote: > Yes, definitely consumers are behind - we can see from examining the > offsets > > > On Wed, Oct 2, 2013 at 1:59 PM, Joe Stein <crypt...@gmail.com> wrote: > >> Are you sure the consumers are behind? could the pause be because the >> stream is empty and producing messages is what is behind the consumption? >> >> What if you shut off your consumers for 5 minutes and then start them >> again >> do the consumers behave the same way? >> >> /******************************************* >> Joe Stein >> Founder, Principal Consultant >> Big Data Open Source Security LLC >> http://www.stealth.ly >> Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop> >> ********************************************/ >> >> >> On Wed, Oct 2, 2013 at 3:54 PM, Graeme Wallace < >> graeme.wall...@farecompare.com> wrote: >> >> > Hi All, >> > >> > We've got processes that produce many millions of itineraries per >> minute. >> > We would like to get them into HBase (so we can query for chunks of them >> > later) - so our idea was to write each itinerary as a message into >> Kafka - >> > so that not only can we have consumers that write to HBase, but also >> other >> > consumers that may provide some sort of real-time monitoring service and >> > also an archive service. >> > >> > Problem is - we don't really know enough about how best to do this >> > effectively with Kafka, so that the producers can run flat out and the >> > consumers can run flat out too. We've tried having one topic, with >> multiple >> > partitions to match the spindles on our broker h/w (12 on each) - and >> > setting up a thread per partition on the consumer side. >> > >> > At the moment, our particular problem is that the consumers just can't >> keep >> > up. We can see from logging that the consumer threads seem to run in >> > bursts, then a pause (as yet we don't know what the pause is - dont >> think >> > its GC). Anyways, does what we are doing with one topic and multiple >> > partitions sound correct ? Or do we need to change ? Any tricks to >> speed up >> > consumption ? (we've tried changing the fetch size - doesnt help much). >> Am >> > i correct in assuming we can have one thread per partition for >> consumption >> > ? >> > >> > Thanks in advance, >> > >> > Graeme >> > >> > -- >> > Graeme Wallace >> > CTO >> > FareCompare.com >> > O: 972 588 1414 >> > M: 214 681 9018 >> > >> > > > > -- > Graeme Wallace > CTO > FareCompare.com > O: 972 588 1414 > M: 214 681 9018 > > -- Graeme Wallace CTO FareCompare.com O: 972 588 1414 M: 214 681 9018