Jun, Pardon the radio silence. I booted up a new broker, created a topic with three (3) partitions and replication factor one (1) and used the *kafka-producer-perf-test.sh *script to generate load (using messages of roughly the same size as ours). There was a slight increase in CPU usage (~5-10%) on 0.8.2.0-rc2 compared to 0.8.1.1, but that was about it.
I upgraded our staging cluster to 0.8.2.0 earlier this week or so, and had to add an additional broker due to increased load after the upgrade (note that the incoming load on the cluster has been pretty much consistent). Since the upgrade we've been seeing an 2-3x increase in latency as well. I'm considering downgrading to 0.8.1.1 again to see if it resolves our issues. Best regards, Mathias On Tue Feb 03 2015 at 6:44:36 PM Jun Rao <j...@confluent.io> wrote: > Mathias, > > The new hprof doesn't reveal anything new to me. We did fix the logic in > using Purgatory in 0.8.2, which could potentially drive up the CPU usage a > bit. To verify that, could you do your test on a single broker (with > replication factor 1) btw 0.8.1 and 0.8.2 and see if there is any > significant difference in cpu usage? > > Thanks, > > Jun > > On Tue, Feb 3, 2015 at 5:09 AM, Mathias Söderberg < > mathias.soederb...@gmail.com> wrote: > > > Jun, > > > > I re-ran the hprof test, for about 30 minutes again, for 0.8.2.0-rc2 with > > the same version of snappy that 0.8.1.1 used. Attached the logs. > > Unfortunately there wasn't any improvement as the node running > 0.8.2.0-rc2 > > still had a higher load and CPU usage. > > > > Best regards, > > Mathias > > > > On Tue Feb 03 2015 at 4:40:31 AM Jaikiran Pai <jai.forums2...@gmail.com> > > wrote: > > > >> On Monday 02 February 2015 11:03 PM, Jun Rao wrote: > >> > Jaikiran, > >> > > >> > The fix you provided in probably unnecessary. The channel that we use > in > >> > SimpleConsumer (BlockingChannel) is configured to be blocking. So even > >> > though the read from the socket is in a loop, each read blocks if > there > >> is > >> > no bytes received from the broker. So, that shouldn't cause extra CPU > >> > consumption. > >> Hi Jun, > >> > >> Of course, you are right! I forgot that while reading the thread dump in > >> hprof output, one has to be aware that the thread state isn't shown and > >> the thread need not necessarily be doing any CPU activity. > >> > >> -Jaikiran > >> > >> > >> > > >> > Thanks, > >> > > >> > Jun > >> > > >> > On Mon, Jan 26, 2015 at 10:05 AM, Mathias Söderberg < > >> > mathias.soederb...@gmail.com> wrote: > >> > > >> >> Hi Neha, > >> >> > >> >> I sent an e-mail earlier today, but noticed now that it didn't > >> actually go > >> >> through. > >> >> > >> >> Anyhow, I've attached two files, one with output from a 10 minute run > >> and > >> >> one with output from a 30 minute run. Realized that maybe I should've > >> done > >> >> one or two runs with 0.8.1.1 as well, but nevertheless. > >> >> > >> >> I upgraded our staging cluster to 0.8.2.0-rc2, and I'm seeing the > same > >> CPU > >> >> usage as with the beta version (basically pegging all cores). If I > >> manage > >> >> to find the time I'll do another run with hprof on the rc2 version > >> later > >> >> today. > >> >> > >> >> Best regards, > >> >> Mathias > >> >> > >> >> On Tue Dec 09 2014 at 10:08:21 PM Neha Narkhede <n...@confluent.io> > >> wrote: > >> >> > >> >>> The following should be sufficient > >> >>> > >> >>> java > >> >>> -agentlib:hprof=cpu=samples,depth=100,interval=20,lineno= > >> >>> y,thread=y,file=kafka.hprof > >> >>> <classname> > >> >>> > >> >>> You would need to start the Kafka server with the settings above for > >> >>> sometime until you observe the problem. > >> >>> > >> >>> On Tue, Dec 9, 2014 at 3:47 AM, Mathias Söderberg < > >> >>> mathias.soederb...@gmail.com> wrote: > >> >>> > >> >>>> Hi Neha, > >> >>>> > >> >>>> Yeah sure. I'm not familiar with hprof, so any particular options I > >> >>> should > >> >>>> include or just run with defaults? > >> >>>> > >> >>>> Best regards, > >> >>>> Mathias > >> >>>> > >> >>>> On Mon Dec 08 2014 at 7:41:32 PM Neha Narkhede <n...@confluent.io> > >> >>> wrote: > >> >>>>> Thanks for reporting the issue. Would you mind running hprof and > >> >>> sending > >> >>>>> the output? > >> >>>>> > >> >>>>> On Mon, Dec 8, 2014 at 1:25 AM, Mathias Söderberg < > >> >>>>> mathias.soederb...@gmail.com> wrote: > >> >>>>> > >> >>>>>> Good day, > >> >>>>>> > >> >>>>>> I upgraded a Kafka cluster from v0.8.1.1 to v0.8.2-beta and > noticed > >> >>>> that > >> >>>>>> the CPU usage on the broker machines went up by roughly 40%, from > >> >>> ~60% > >> >>>> to > >> >>>>>> ~100% and am wondering if anyone else has experienced something > >> >>>> similar? > >> >>>>>> The load average also went up by 2x-3x. > >> >>>>>> > >> >>>>>> We're running on EC2 and the cluster currently consists of four > >> >>>>> m1.xlarge, > >> >>>>>> with roughly 1100 topics / 4000 partitions. Using Java 7 > (1.7.0_65 > >> >>> to > >> >>>> be > >> >>>>>> exact) and Scala 2.9.2. Configurations can be found over here: > >> >>>>>> https://gist.github.com/mthssdrbrg/7df34a795e07eef10262. > >> >>>>>> > >> >>>>>> I'm assuming that this is not expected behaviour for 0.8.2-beta? > >> >>>>>> > >> >>>>>> Best regards, > >> >>>>>> Mathias > >> >>>>>> > >> >>>>> > >> >>>>> > >> >>>>> -- > >> >>>>> Thanks, > >> >>>>> Neha > >> >>>>> > >> >>> > >> >>> > >> >>> -- > >> >>> Thanks, > >> >>> Neha > >> >>> > >> > >> >