Make sure the fetch batch size and the local consumer queue sizes are large
enough, setting them too low will limit your throughput to the
broker<->client latency.

This would be controlled using the following properties:
- fetch.message.max.bytes
- queued.max.message.chunks

On the producer side you would want to play with:
 - queue.buffering.max.ms and .messages
 - batch.num.messages

Memory on the broker should only affect disk cache performance, the more
the merrier of course, but it depends on your use case, with a bit of luck
the disk caches are already hot for the data you are reading (e.g.,
recently produced).

Consuming millions of messages per second on quad core i7 with 8 gigs of
RAM is possible without a sweat, given the disk caches are hot.


Regards,
Magnus


2013/10/11 Bruno D. Rodrigues <bruno.rodrig...@litux.org>

>
> > On Thu, Oct 10, 2013 at 3:57 PM, Bruno D. Rodrigues <
> > bruno.rodrig...@litux.org> wrote:
> >
> >> My personal newbie experience, which is surely completely wrong and
> >> miss-configured, got me up to 70MB/sec, either with controlled 1K
> messages
> >> (hence 70Kmsg/sec) as well as with more random data (test data from 100
> >> bytes to a couple MB). First I thought the 70MB were the hard disk
> limit,
> >> but when I got the same result both with a proper linux server with a
> 10K
> >> disk, as well as with a Mac mini with a 5400rpm disk, I got confused.
> >>
> >> The mini has 2G, the linux server has 8 or 16, can'r recall at the
> moment.
> >>
> >> The test was performed both with single and multi producers and
> consumers.
> >> One producer = 70MB, two producers = 35MB each and so forth. Running
> >> standalone instances on each server, same value. Running both together
> in 2
> >> partition 2 replica crossed mode, same result.
> >>
> >> As far as I understood, more memory just means more kernel buffer space
> to
> >> speed up the lack of disk speed, as kafka seems to not really depend on
> >> memory for the queueing.
>
> A 11/10/2013, às 17:28, Guozhang Wang <wangg...@gmail.com> escreveu:
>
> > Hello,
> >
> > In most cases of Kafka, network bottleneck will be hit before the disk
> > bottleneck. So maybe you want to check your network capacity to see if it
> > has been saturated.
>
> They are all connected to Gbit ethernet cards and proper network routers.
> I can easily get way above 950Mbps up and down between each machine and
> even between multiple machines. Gbit is 128MB/s. 70MB/s is 560Kbps. So far
> so good, 56% network capacity is a goodish value. But then I enable snappy,
> get the same 70MB on the input and output side, and 20MB/sec on the
> network, so it surely isn't network limits. It's also not on the input or
> output side - the input reads a pre-processed MMaped file that reads at
> 150MB/sec without cache (SSD) up to 3GB/sec when loaded into memory. The
> output simply counts the messages and size of them.
>
> One weird thing is that the kafka process seems to not cross the 100% cpu
> on the top or equivalent command. Top shows 100% for each CPU, so a
> multi-threaded process should go up to 400% (both the linux and mac mini
> are 2 CPU with hiperthreading, so "almost" 4 cpus).
>
>
>

Reply via email to