I saw a very similar jump in CPU usage when I tried upgrading from 0.8.1.1 to 0.8.2.0 today in a test environment. The Kafka cluster there is two m1.larges handling 2,000 partitions across 32 topics. CPU usage rose from 40% into the 150%–190% range, and load average from under 1 to over 4. Downgrading to 0.8.1.1 brought the CPU and load back to the previous values.
If there's more info that would be helpful, please let me know. On Thu, Feb 12, 2015 at 4:17 PM, Mathias Söderberg < mathias.soederb...@gmail.com> wrote: > Jun, > > Pardon the radio silence. I booted up a new broker, created a topic with > three (3) partitions and replication factor one (1) and used the > *kafka-producer-perf-test.sh > *script to generate load (using messages of roughly the same size as ours). > There was a slight increase in CPU usage (~5-10%) on 0.8.2.0-rc2 compared > to 0.8.1.1, but that was about it. > > I upgraded our staging cluster to 0.8.2.0 earlier this week or so, and had > to add an additional broker due to increased load after the upgrade (note > that the incoming load on the cluster has been pretty much consistent). > Since the upgrade we've been seeing an 2-3x increase in latency as well. > I'm considering downgrading to 0.8.1.1 again to see if it resolves our > issues. > > Best regards, > Mathias > > On Tue Feb 03 2015 at 6:44:36 PM Jun Rao <j...@confluent.io> wrote: > > > Mathias, > > > > The new hprof doesn't reveal anything new to me. We did fix the logic in > > using Purgatory in 0.8.2, which could potentially drive up the CPU usage > a > > bit. To verify that, could you do your test on a single broker (with > > replication factor 1) btw 0.8.1 and 0.8.2 and see if there is any > > significant difference in cpu usage? > > > > Thanks, > > > > Jun > > > > On Tue, Feb 3, 2015 at 5:09 AM, Mathias Söderberg < > > mathias.soederb...@gmail.com> wrote: > > > > > Jun, > > > > > > I re-ran the hprof test, for about 30 minutes again, for 0.8.2.0-rc2 > with > > > the same version of snappy that 0.8.1.1 used. Attached the logs. > > > Unfortunately there wasn't any improvement as the node running > > 0.8.2.0-rc2 > > > still had a higher load and CPU usage. > > > > > > Best regards, > > > Mathias > > > > > > On Tue Feb 03 2015 at 4:40:31 AM Jaikiran Pai < > jai.forums2...@gmail.com> > > > wrote: > > > > > >> On Monday 02 February 2015 11:03 PM, Jun Rao wrote: > > >> > Jaikiran, > > >> > > > >> > The fix you provided in probably unnecessary. The channel that we > use > > in > > >> > SimpleConsumer (BlockingChannel) is configured to be blocking. So > even > > >> > though the read from the socket is in a loop, each read blocks if > > there > > >> is > > >> > no bytes received from the broker. So, that shouldn't cause extra > CPU > > >> > consumption. > > >> Hi Jun, > > >> > > >> Of course, you are right! I forgot that while reading the thread dump > in > > >> hprof output, one has to be aware that the thread state isn't shown > and > > >> the thread need not necessarily be doing any CPU activity. > > >> > > >> -Jaikiran > > >> > > >> > > >> > > > >> > Thanks, > > >> > > > >> > Jun > > >> > > > >> > On Mon, Jan 26, 2015 at 10:05 AM, Mathias Söderberg < > > >> > mathias.soederb...@gmail.com> wrote: > > >> > > > >> >> Hi Neha, > > >> >> > > >> >> I sent an e-mail earlier today, but noticed now that it didn't > > >> actually go > > >> >> through. > > >> >> > > >> >> Anyhow, I've attached two files, one with output from a 10 minute > run > > >> and > > >> >> one with output from a 30 minute run. Realized that maybe I > should've > > >> done > > >> >> one or two runs with 0.8.1.1 as well, but nevertheless. > > >> >> > > >> >> I upgraded our staging cluster to 0.8.2.0-rc2, and I'm seeing the > > same > > >> CPU > > >> >> usage as with the beta version (basically pegging all cores). If I > > >> manage > > >> >> to find the time I'll do another run with hprof on the rc2 version > > >> later > > >> >> today. > > >> >> > > >> >> Best regards, > > >> >> Mathias > > >> >> > > >> >> On Tue Dec 09 2014 at 10:08:21 PM Neha Narkhede <n...@confluent.io > > > > >> wrote: > > >> >> > > >> >>> The following should be sufficient > > >> >>> > > >> >>> java > > >> >>> -agentlib:hprof=cpu=samples,depth=100,interval=20,lineno= > > >> >>> y,thread=y,file=kafka.hprof > > >> >>> <classname> > > >> >>> > > >> >>> You would need to start the Kafka server with the settings above > for > > >> >>> sometime until you observe the problem. > > >> >>> > > >> >>> On Tue, Dec 9, 2014 at 3:47 AM, Mathias Söderberg < > > >> >>> mathias.soederb...@gmail.com> wrote: > > >> >>> > > >> >>>> Hi Neha, > > >> >>>> > > >> >>>> Yeah sure. I'm not familiar with hprof, so any particular > options I > > >> >>> should > > >> >>>> include or just run with defaults? > > >> >>>> > > >> >>>> Best regards, > > >> >>>> Mathias > > >> >>>> > > >> >>>> On Mon Dec 08 2014 at 7:41:32 PM Neha Narkhede < > n...@confluent.io> > > >> >>> wrote: > > >> >>>>> Thanks for reporting the issue. Would you mind running hprof and > > >> >>> sending > > >> >>>>> the output? > > >> >>>>> > > >> >>>>> On Mon, Dec 8, 2014 at 1:25 AM, Mathias Söderberg < > > >> >>>>> mathias.soederb...@gmail.com> wrote: > > >> >>>>> > > >> >>>>>> Good day, > > >> >>>>>> > > >> >>>>>> I upgraded a Kafka cluster from v0.8.1.1 to v0.8.2-beta and > > noticed > > >> >>>> that > > >> >>>>>> the CPU usage on the broker machines went up by roughly 40%, > from > > >> >>> ~60% > > >> >>>> to > > >> >>>>>> ~100% and am wondering if anyone else has experienced something > > >> >>>> similar? > > >> >>>>>> The load average also went up by 2x-3x. > > >> >>>>>> > > >> >>>>>> We're running on EC2 and the cluster currently consists of four > > >> >>>>> m1.xlarge, > > >> >>>>>> with roughly 1100 topics / 4000 partitions. Using Java 7 > > (1.7.0_65 > > >> >>> to > > >> >>>> be > > >> >>>>>> exact) and Scala 2.9.2. Configurations can be found over here: > > >> >>>>>> https://gist.github.com/mthssdrbrg/7df34a795e07eef10262. > > >> >>>>>> > > >> >>>>>> I'm assuming that this is not expected behaviour for > 0.8.2-beta? > > >> >>>>>> > > >> >>>>>> Best regards, > > >> >>>>>> Mathias > > >> >>>>>> > > >> >>>>> > > >> >>>>> > > >> >>>>> -- > > >> >>>>> Thanks, > > >> >>>>> Neha > > >> >>>>> > > >> >>> > > >> >>> > > >> >>> -- > > >> >>> Thanks, > > >> >>> Neha > > >> >>> > > >> > > >> > > >