Mathias, What's the ack mode you used in the producer? Could you share the command you used to run kafka-producer-perf-test.sh?
Thanks, Jun On Thu, Feb 12, 2015 at 1:17 PM, Mathias Söderberg < mathias.soederb...@gmail.com> wrote: > Jun, > > Pardon the radio silence. I booted up a new broker, created a topic with > three (3) partitions and replication factor one (1) and used the > *kafka-producer-perf-test.sh > *script to generate load (using messages of roughly the same size as ours). > There was a slight increase in CPU usage (~5-10%) on 0.8.2.0-rc2 compared > to 0.8.1.1, but that was about it. > > I upgraded our staging cluster to 0.8.2.0 earlier this week or so, and had > to add an additional broker due to increased load after the upgrade (note > that the incoming load on the cluster has been pretty much consistent). > Since the upgrade we've been seeing an 2-3x increase in latency as well. > I'm considering downgrading to 0.8.1.1 again to see if it resolves our > issues. > > Best regards, > Mathias > > On Tue Feb 03 2015 at 6:44:36 PM Jun Rao <j...@confluent.io> wrote: > > > Mathias, > > > > The new hprof doesn't reveal anything new to me. We did fix the logic in > > using Purgatory in 0.8.2, which could potentially drive up the CPU usage > a > > bit. To verify that, could you do your test on a single broker (with > > replication factor 1) btw 0.8.1 and 0.8.2 and see if there is any > > significant difference in cpu usage? > > > > Thanks, > > > > Jun > > > > On Tue, Feb 3, 2015 at 5:09 AM, Mathias Söderberg < > > mathias.soederb...@gmail.com> wrote: > > > > > Jun, > > > > > > I re-ran the hprof test, for about 30 minutes again, for 0.8.2.0-rc2 > with > > > the same version of snappy that 0.8.1.1 used. Attached the logs. > > > Unfortunately there wasn't any improvement as the node running > > 0.8.2.0-rc2 > > > still had a higher load and CPU usage. > > > > > > Best regards, > > > Mathias > > > > > > On Tue Feb 03 2015 at 4:40:31 AM Jaikiran Pai < > jai.forums2...@gmail.com> > > > wrote: > > > > > >> On Monday 02 February 2015 11:03 PM, Jun Rao wrote: > > >> > Jaikiran, > > >> > > > >> > The fix you provided in probably unnecessary. The channel that we > use > > in > > >> > SimpleConsumer (BlockingChannel) is configured to be blocking. So > even > > >> > though the read from the socket is in a loop, each read blocks if > > there > > >> is > > >> > no bytes received from the broker. So, that shouldn't cause extra > CPU > > >> > consumption. > > >> Hi Jun, > > >> > > >> Of course, you are right! I forgot that while reading the thread dump > in > > >> hprof output, one has to be aware that the thread state isn't shown > and > > >> the thread need not necessarily be doing any CPU activity. > > >> > > >> -Jaikiran > > >> > > >> > > >> > > > >> > Thanks, > > >> > > > >> > Jun > > >> > > > >> > On Mon, Jan 26, 2015 at 10:05 AM, Mathias Söderberg < > > >> > mathias.soederb...@gmail.com> wrote: > > >> > > > >> >> Hi Neha, > > >> >> > > >> >> I sent an e-mail earlier today, but noticed now that it didn't > > >> actually go > > >> >> through. > > >> >> > > >> >> Anyhow, I've attached two files, one with output from a 10 minute > run > > >> and > > >> >> one with output from a 30 minute run. Realized that maybe I > should've > > >> done > > >> >> one or two runs with 0.8.1.1 as well, but nevertheless. > > >> >> > > >> >> I upgraded our staging cluster to 0.8.2.0-rc2, and I'm seeing the > > same > > >> CPU > > >> >> usage as with the beta version (basically pegging all cores). If I > > >> manage > > >> >> to find the time I'll do another run with hprof on the rc2 version > > >> later > > >> >> today. > > >> >> > > >> >> Best regards, > > >> >> Mathias > > >> >> > > >> >> On Tue Dec 09 2014 at 10:08:21 PM Neha Narkhede <n...@confluent.io > > > > >> wrote: > > >> >> > > >> >>> The following should be sufficient > > >> >>> > > >> >>> java > > >> >>> -agentlib:hprof=cpu=samples,depth=100,interval=20,lineno= > > >> >>> y,thread=y,file=kafka.hprof > > >> >>> <classname> > > >> >>> > > >> >>> You would need to start the Kafka server with the settings above > for > > >> >>> sometime until you observe the problem. > > >> >>> > > >> >>> On Tue, Dec 9, 2014 at 3:47 AM, Mathias Söderberg < > > >> >>> mathias.soederb...@gmail.com> wrote: > > >> >>> > > >> >>>> Hi Neha, > > >> >>>> > > >> >>>> Yeah sure. I'm not familiar with hprof, so any particular > options I > > >> >>> should > > >> >>>> include or just run with defaults? > > >> >>>> > > >> >>>> Best regards, > > >> >>>> Mathias > > >> >>>> > > >> >>>> On Mon Dec 08 2014 at 7:41:32 PM Neha Narkhede < > n...@confluent.io> > > >> >>> wrote: > > >> >>>>> Thanks for reporting the issue. Would you mind running hprof and > > >> >>> sending > > >> >>>>> the output? > > >> >>>>> > > >> >>>>> On Mon, Dec 8, 2014 at 1:25 AM, Mathias Söderberg < > > >> >>>>> mathias.soederb...@gmail.com> wrote: > > >> >>>>> > > >> >>>>>> Good day, > > >> >>>>>> > > >> >>>>>> I upgraded a Kafka cluster from v0.8.1.1 to v0.8.2-beta and > > noticed > > >> >>>> that > > >> >>>>>> the CPU usage on the broker machines went up by roughly 40%, > from > > >> >>> ~60% > > >> >>>> to > > >> >>>>>> ~100% and am wondering if anyone else has experienced something > > >> >>>> similar? > > >> >>>>>> The load average also went up by 2x-3x. > > >> >>>>>> > > >> >>>>>> We're running on EC2 and the cluster currently consists of four > > >> >>>>> m1.xlarge, > > >> >>>>>> with roughly 1100 topics / 4000 partitions. Using Java 7 > > (1.7.0_65 > > >> >>> to > > >> >>>> be > > >> >>>>>> exact) and Scala 2.9.2. Configurations can be found over here: > > >> >>>>>> https://gist.github.com/mthssdrbrg/7df34a795e07eef10262. > > >> >>>>>> > > >> >>>>>> I'm assuming that this is not expected behaviour for > 0.8.2-beta? > > >> >>>>>> > > >> >>>>>> Best regards, > > >> >>>>>> Mathias > > >> >>>>>> > > >> >>>>> > > >> >>>>> > > >> >>>>> -- > > >> >>>>> Thanks, > > >> >>>>> Neha > > >> >>>>> > > >> >>> > > >> >>> > > >> >>> -- > > >> >>> Thanks, > > >> >>> Neha > > >> >>> > > >> > > >> > > >