@Ben, the big GC stalls could be related to the 16GB max heap size.  When
you have a bigger heap size, you need more time to GC if/when you hit a
garbage collection.  In general, Kafka shouldn't need more than a 5GB heap,
and lowering your heap size combined with using the G1GC (and preferably
Java 8) should help you have better GC performance.

@Lawrence, you probably want to raise your max heap size up from 256M to 1G
(KAFKA_HEAP_OPTS="-Xmx1G") and see how it goes.  The total memory used on
the system might be 43% but the heap usage is a different measurement.
You'll see an out of memory error if you push the JVM heap size beyond 256M
in your case even if the system has available memory.

On Thu, Jun 9, 2016 at 1:08 PM, Stephen Powis <spo...@salesforce.com> wrote:

> Hey Ben
>
> Using G1 with those settings appears to be working well for us.  Infrequent
> younggen/minor GCs averaging a run time of 12ms, no full GCs in the 24
> hours logged that I uploaded.  I'd say enable the GC log flags and let it
> run for a bit, then change a setting or two and compare.
>
>
>
> On Thu, Jun 9, 2016 at 3:59 PM, Ben Osheroff <b...@zendesk.com.invalid>
> wrote:
>
> > We've been having quite a few symptoms that appear to be big GC stalls
> > (nonsensical ZK session timeouts) with the following config:
> >
> > -Xmx16g
> > -Xms16g
> > -server
> > -XX:+CMSClassUnloadingEnabled
> > -XX:+CMSScavengeBeforeRemark
> > -XX:+UseG1GC
> > -XX:+DisableExplicitGC
> >
> > Next steps will be to turn on gc logging and try to confirm that the ZK
> > session timeouts are indeed GC pauses (they look like major
> > collections), but meanwhile, does anyone have experience around whether
> > these options (taken from https://kafka.apache.org/081/ops.html) helped?
> > Would prefer to not just blindly turn on options if possible.
> >
> > -XX:PermSize=48m
> > -XX:MaxPermSize=48m
> > -XX:MaxGCPauseMillis=20
> > -XX:InitiatingHeapOccupancyPercent=35
> >
> > Thanks!
> > Ben Osheroff
> > Zendesk.com
> >
> > On Thu, Jun 09, 2016 at 03:52:41PM -0400, Stephen Powis wrote:
> > > NOTE -- GC tuning is outside the realm of my expertise by all means, so
> > I'm
> > > not sure I'd use our info as any kind of benchmark.
> > >
> > > But in the interest of sharing, we use the following options
> > >
> > > export KAFKA_HEAP_OPTS="-Xmx12G -Xms12G"
> > > >
> > > > export KAFKA_JVM_PERFORMANCE_OPTS="-server -Djava.awt.headless=true
> > > > -XX:MaxPermSize=48M -verbose:gc -Xloggc:/var/log/kafka/gc.log
> > > > -XX:+PrintGCDateStamps -XX:+PrintGCDetails
> > -XX:+PrintTenuringDistribution
> > > > -XX:+PrintGCApplicationStoppedTime -XX:+PrintTLAB
> > -XX:+DisableExplicitGC
> > > > -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10
> > -XX:GCLogFileSize=100M
> > > > -XX:+UseCompressedOops -XX:+AlwaysPreTouch -XX:+UseG1GC
> > > > -XX:MaxGCPauseMillis=20 -XX:+HeapDumpOnOutOfMemoryError
> > > > -XX:HeapDumpPath=/var/log/kafka/heapDump.log"
> > > >
> > >
> > > You can then take your gc.log files and use an analyzer tool...I've
> > > attached a link to one of our brokers gclog run thru gceasy.io.
> > >
> > > https://protect-us.mimecast.com/s/wXqqBJuqdZb1Tn
> > >
> > > On Thu, Jun 9, 2016 at 3:39 PM, Lawrence Weikum <lwei...@pandora.com>
> > wrote:
> > >
> > > > Hi Tom,
> > > >
> > > > Currently we’re using the default settings – no special tuning
> > > > whatsoever.  I think the kafka-run-class.sh has this:
> > > >
> > > >
> > > > # Memory options
> > > > if [ -z "$KAFKA_HEAP_OPTS" ]; then
> > > >   KAFKA_HEAP_OPTS="-Xmx256M"
> > > > fi
> > > >
> > > > # JVM performance options
> > > > if [ -z "$KAFKA_JVM_PERFORMANCE_OPTS" ]; then
> > > >   KAFKA_JVM_PERFORMANCE_OPTS="-server -XX:+UseG1GC
> > -XX:MaxGCPauseMillis=20
> > > > -XX:InitiatingHeapOccupancyPercent=35 -XX:+DisableExplicitGC
> > > > -Djava.awt.headless=true"
> > > > fi
> > > >
> > > >
> > > > Is this the confluent doc you were referring to?
> > > > https://protect-us.mimecast.com/s/arXXBOspkvORCD
> > > >
> > > > Thanks!
> > > >
> > > > Lawrence Weikum
> > > >
> > > >
> > > > On 6/9/16, 1:32 PM, "Tom Crayford" <tcrayf...@heroku.com> wrote:
> > > >
> > > > >Hi Lawrence,
> > > > >
> > > > >What JVM options were you using? There's a few pages in the
> confluent
> > docs
> > > > >on JVM tuning iirc. We simply use the G1 and a 4GB Max heap and
> things
> > > > work
> > > > >well (running many thousands of clusters).
> > > > >
> > > > >Thanks
> > > > >Tom Crayford
> > > > >Heroku Kafka
> > > > >
> > > > >On Thursday, 9 June 2016, Lawrence Weikum <lwei...@pandora.com>
> > wrote:
> > > > >
> > > > >> Hello all,
> > > > >>
> > > > >> We’ve been running a benchmark test on a Kafka cluster of ours
> > running
> > > > >> 0.9.0.1 – slamming it with messages to see when/if things might
> > break.
> > > > >> During our test, we caused two brokers to throw OutOfMemory errors
> > > > (looks
> > > > >> like from the Heap) even though each machine still has 43% of the
> > total
> > > > >> memory unused.
> > > > >>
> > > > >> I’m curious what JVM optimizations are recommended for Kafka
> > brokers?
> > > > Or
> > > > >> if there aren’t any that are recommended, what are some
> > optimizations
> > > > >> others are using to keep the brokers running smoothly?
> > > > >>
> > > > >> Best,
> > > > >>
> > > > >> Lawrence Weikum
> > > > >>
> > > > >>
> > > >
> > > >
> >
>



-- 
Dustin Cote
confluent.io

Reply via email to