Hi All, We've got processes that produce many millions of itineraries per minute. We would like to get them into HBase (so we can query for chunks of them later) - so our idea was to write each itinerary as a message into Kafka - so that not only can we have consumers that write to HBase, but also other consumers that may provide some sort of real-time monitoring service and also an archive service.
Problem is - we don't really know enough about how best to do this effectively with Kafka, so that the producers can run flat out and the consumers can run flat out too. We've tried having one topic, with multiple partitions to match the spindles on our broker h/w (12 on each) - and setting up a thread per partition on the consumer side. At the moment, our particular problem is that the consumers just can't keep up. We can see from logging that the consumer threads seem to run in bursts, then a pause (as yet we don't know what the pause is - dont think its GC). Anyways, does what we are doing with one topic and multiple partitions sound correct ? Or do we need to change ? Any tricks to speed up consumption ? (we've tried changing the fetch size - doesnt help much). Am i correct in assuming we can have one thread per partition for consumption ? Thanks in advance, Graeme -- Graeme Wallace CTO FareCompare.com O: 972 588 1414 M: 214 681 9018