> - Decreased num.partitions and log.flush.interval on the brokers from > 64/10k to 32/100 in order to lower the average flush time (we were > previously always hitting the default flush interval since no partitions
Hmm, that is a pretty low value for flush interval leading to higher disk usage. Do you use dedicated disks for kafka data logs ? Also what sort of disks do you use ? Thanks, Neha > > > On Tue, Apr 23, 2013 at 7:53 AM, Jun Rao <jun...@gmail.com> wrote: > > > You can run kafka.tools.ConsumerOffsetChecker to check the consumer lag. If > > the consumer is lagging, this indicates a problem on the consumer side. > > > > Thanks, > > > > Jun > > > > > > On Mon, Apr 22, 2013 at 9:13 PM, Andrew Neilson <arsneil...@gmail.com > > >wrote: > > > > > Hmm it is highly unlikely that that is the culprit... There is lots of > > > bandwidth available for me to use. I will definitely keep that in mind > > > though. I was working on this today and have some tidbits of additional > > > information and thoughts that you might be able to shed some light on: > > > > > > - I mentioned I have 2 consumers, but each consumer is running with 8 > > > threads for this topic (and each consumer has 8 cores available). > > > - When I initially asked for help the brokers were configured with > > > num.partitions=1, I've since tried higher numbers (3, 64) and haven't > > > seen > > > much of an improvement aside from forcing both consumer apps to handle > > > messages (with the overall performance not changing much). > > > - I ran into this article > > > > > > > > http://riccomini.name/posts/kafka/2012-10-05-kafka-consumer-memory-tuning/and > > > tried a variety of options of queuedchunks.max and fetch.size with no > > > significant results (simply meaning it did not achieve the goal of me > > > constantly processing hundreds or thousands of messages per second, > > > which > > > is similar to the rate of input). I would not be surprised if I'm > > wrong > > > but > > > this made me start to think that the problem may lie outside of the > > > consumers > > > - Would the combination of a high number of partitions (64) and a high > > > log.flush.interval (10k) prevent logs from flushing as often as they > > > need > > > to for my desired rate of consumption (even with > > > log.default.flush.interval.ms=1000?) > > > > > > Despite the changes I mentioned the behaviour is still the consumers > > > receiving larger spikes of messages mixed with periods of complete > > > inactivity and overall a long delay between messages being written and > > > messages being read (about 2 minutes). Anyway... as always I greatly > > > appreciate any help. > > > > > > On Sun, Apr 21, 2013 at 8:50 PM, Jun Rao <jun...@gmail.com> wrote: > > > > > > > Is your network shared? Is so, another possibility is that some other > > > apps > > > > are consuming the bandwidth. > > > > > > > > Thanks, > > > > > > > > Jun > > > > > > > > > > > > On Sun, Apr 21, 2013 at 12:23 PM, Andrew Neilson < arsneil...@gmail.com > > > > >wrote: > > > > > > > > > Thanks very much for the reply Neha! So I swapped out the consumer > > that > > > > > processes the messages with one that just prints them. It does indeed > > > > > achieve a much better rate at peaks but can still nearly zero out (if > > > not > > > > > completely zero out). I plotted the messages printed in graphite to > > > show > > > > > the behaviour I'm seeing (this is messages printed per second): > > > > > > > > > > > > > > > > > > > > > > > > https://www.dropbox.com/s/7u7uyrefw6inetu/Screen%20Shot%202013-04-21%20at%2011.44.38%20AM.png > > > > > > > > > > The peaks are over ten thousand per second and the troughs can go > > below > > > > 10 > > > > > per second just prior to another peak. I know that there are plenty > > of > > > > > messages available because the ones currently being processed are > > still > > > > > from Friday afternoon, so this may or may not have something to do > > with > > > > > this pattern. > > > > > > > > > > Is there anything I can do to avoid the periods of lower performance? > > > > > Ideally I would be processing messages as soon as they are written. > > > > > > > > > > > > > > > On Sun, Apr 21, 2013 at 8:49 AM, Neha Narkhede < > > > neha.narkh...@gmail.com > > > > > >wrote: > > > > > > > > > > > Some of the reasons a consumer is slow are - > > > > > > 1. Small fetch size > > > > > > 2. Expensive message processing > > > > > > > > > > > > Are you processing the received messages in the consumer ? Have you > > > > > > tried running console consumer for this topic and see how it > > performs > > > > > > ? > > > > > > > > > > > > Thanks, > > > > > > Neha > > > > > > > > > > > > On Sun, Apr 21, 2013 at 1:59 AM, Andrew Neilson < > > > arsneil...@gmail.com> > > > > > > wrote: > > > > > > > I am currently running a deployment with 3 brokers, 3 ZK, 3 > > > > producers, > > > > > 2 > > > > > > > consumers, and 15 topics. I should first point out that this is > > my > > > > > first > > > > > > > project using Kafka ;). The issue I'm seeing is that the > > consumers > > > > are > > > > > > only > > > > > > > processing about 15 messages per second from what should be the > > > > largest > > > > > > > topic it is consuming (we're sending 200-400 ~300 byte messages > > per > > > > > > second > > > > > > > to this topic). I should note that I'm using a high level ZK > > > consumer > > > > > and > > > > > > > ZK 3.4.3. > > > > > > > > > > > > > > I have a strong feeling I have not configured things properly so > > I > > > > > could > > > > > > > definitely use some guidance. Here is my broker configuration: > > > > > > > > > > > > > > brokerid=1 > > > > > > > port=9092 > > > > > > > socket.send.buffer=1048576 > > > > > > > socket.receive.buffer=1048576 > > > > > > > max.socket.request.bytes=104857600 > > > > > > > log.dir=/home/kafka/data > > > > > > > num.partitions=1 > > > > > > > log.flush.interval=10000 > > > > > > > log.default.flush.interval.ms=1000 > > > > > > > log.default.flush.scheduler.interval.ms=1000 > > > > > > > log.retention.hours=168 > > > > > > > log.file.size=536870912 > > > > > > > enable.zookeeper=true > > > > > > > zk.connect=XXX > > > > > > > zk.connectiontimeout.ms=1000000 > > > > > > > > > > > > > > Here is my producer config: > > > > > > > > > > > > > > zk.connect=XXX > > > > > > > producer.type=async > > > > > > > compression.codec=0 > > > > > > > > > > > > > > Here is my consumer config: > > > > > > > > > > > > > > zk.connect=XXX > > > > > > > zk.connectiontimeout.ms=100000 > > > > > > > groupid=XXX > > > > > > > autooffset.reset=smallest > > > > > > > socket.buffersize=1048576 > > > > > > > fetch.size=10485760 > > > > > > > queuedchunks.max=10000 > > > > > > > > > > > > > > Thanks for any assistance you can provide, > > > > > > > > > > > > > > Andrew > > > > > > > > > > > > > > > > > > > >