With ack=1, acked messages could be lost when the leader fails.

Thanks,

Jun

On Mon, Sep 29, 2014 at 8:04 AM, Andrew Otto <o...@wikimedia.org> wrote:

> This happened again to me this weekend.  I've done some sleuthing, and
> definitely can see some crazy paging stats when this lock up happens.  For
> the curious, more info can be found here:
> https://bugzilla.wikimedia.org/show_bug.cgi?id=69667.  I had tuned
> dirty_expire_centisecs from 30 seconds to 10, but this still happened
> again.  I'll continue to troubleshoot and tune.  This slow going because it
> is not regularly reproducible.  I have to make a single change and then
> wait a week or two for the timeout to occur.
>
> Here's a related question.  When the timeout happens, we lose some
> messages.  Our producer is varnishkafka, which uses the librdkafka producer
> client.  librdkafka keeps track of produce errors.  We
> have kafka.topic.request.required.acks = 1.  According to librdkafka, all
> messages sent have been ACKed by the leader of the partition to which the
> messages are sent.  Also, when we lose messages due to this timeout, the
> broker that times out is always the controller.  When it attempts to
> reconnect to Zookeeper, we see:
>
>   INFO  kafka.utils.ZkUtils$  - conflict in /controller data:
> {"version":1,"brokerid":21,"timestamp":"1411879981756"} stored data:
> {"version":1,"brokerid":22,"timestamp":"1407187809296"}
>
> In the case when a controller drops out of the ISR for a few seconds, is it
> possible for this confused broker to drop ACKed messages?
>
>
>
> On Thu, Jul 3, 2014 at 12:48 AM, Jun Rao <jun...@gmail.com> wrote:
>
> > Are you on Linux? We have seen this pattern (user/sys time low and real
> > time high in GC time) before. In our case, the problem was due to disk
> > I/Os. When there are lots of dirty pages (in our case, this is caused by
> > log4j logging), Linux can draft user threads (in this case GC threads) to
> > flush the dirty pages. So, all those time in real was spent on disk I/Os,
> > rather than real GCs. The fix is to tune dirty_expire_centisecs and
> > dirty_writeback_centisecs
> > to flush dirty pages more frequently to avoid such drafting.
> >
> > Thanks,
> >
> > Jun
> >
> >
> > On Wed, Jul 2, 2014 at 1:32 PM, Andrew Otto <o...@wikimedia.org> wrote:
> >
> > > Hi again!
> > >
> > > I've been having this issue consistently since I first started this
> > thread,
> > > but it was happening infrequently enough for me to brush it aside and
> > just
> > > run an election to rebalance brokers.
> > >
> > > I recently expanded (and reinstalled) our Kafka cluster so that it now
> > has
> > > 4 brokers with a default replication factor of 3 for each partition.  I
> > > also switched over to the G1GC as recommended here:
> > > https://kafka.apache.org/081/ops.html (even though we are still
> running
> > > Kafka 0.8.0; we hope to upgrade soon).
> > >
> > > Now, only one of the 4 brokers (analytics1021, the same problem broker
> we
> > > saw before) gets its ZK connection expired even more frequently.
> > >  Previously it was less than once a week, now I am seeing this happen
> > > multiple times a day.
> > >
> > > I've posted all the relevant logs from a recent event here:
> > > https://gist.github.com/ottomata/e42480446c627ea0af22
> > >
> > > This includes the GC log on the offending Kafka broker during the time
> > this
> > > happened.  I am pretty green when it comes to GC tuning, but I do see
> > this
> > > interesting stat:
> > >
> > >
> > >  [Times: user=0.14 sys=0.00, real=11.47 secs]
> > >
> > > Did Kafka's JVM really just take 11.47 secs to do a GC there?  I'm
> > > probably missing something, but I don't see which part of that real
> > > time summary makes up the bulk of that GC time
> > >
> > > This is strange, riight?  This broker is identically configured to all
> > > its peers, and should be handling on average the exact same amount and
> > > type of traffic.  Anyone have any advice?
> > >
> > > Thanks!
> > > -Andrew Otto
> > >
> > >
> > >
> > >
> > > On Fri, Mar 21, 2014 at 6:48 PM, Neha Narkhede <
> neha.narkh...@gmail.com>
> > > wrote:
> > >
> > > > I see, that makes sense. Thanks a lot for clarifying!
> > > >
> > > > -Neha
> > > >
> > > >
> > > > On Fri, Mar 21, 2014 at 11:01 AM, Bae, Jae Hyeon <metac...@gmail.com
> >
> > > > wrote:
> > > >
> > > > > Let me clarify the situation. I forgot to mention that my case
> might
> > > not
> > > > be
> > > > > general one because Netflix is using Apache Curator as the main
> > > zookeeper
> > > > > client and ZkClient in Kafka should be bridged to Apache Curator,
> so
> > > the
> > > > > behavior I have seen might not be general one.
> > > > >
> > > > > Kafka's ZKSessionExpireListener.handleNewSession() is reinstating
> all
> > > > > ephemeral nodes and watchers but handleNewSession() was not kicked
> in
> > > my
> > > > > case. So, I created Netflix internal version of ZkClient to replace
> > > > > ephemeral node creation and watcher reinstating.
> > > > >
> > > > > I have a plan to remove all external dependency from Kafka soon
> and I
> > > > might
> > > > > be able to mention more about Kafka's zk resiliency.
> > > > >
> > > > >
> > > > > On Fri, Mar 21, 2014 at 7:51 AM, Jun Rao <jun...@gmail.com> wrote:
> > > > >
> > > > > > Hi, Jae,
> > > > > >
> > > > > > My understanding is that ephemeral nodes can only be lost on ZK
> > > session
> > > > > > expiration. Are you saying that there are other cases when
> > ephemeral
> > > > > nodes
> > > > > > can be lost?
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Jun
> > > > > >
> > > > > >
> > > > > > On Thu, Mar 20, 2014 at 9:52 PM, Bae, Jae Hyeon <
> > metac...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > This issue is zookeeper resiliency.
> > > > > > >
> > > > > > > What I have done is, ephemeral node creation is replaced by
> > Apache
> > > > > > > Curator's PersistentEphemeralNode recipe, to reinstate
> ephemeral
> > > > nodes
> > > > > > > after zookeeper blip. Also, all watchers also should be
> > reinstated.
> > > > > Kafka
> > > > > > > internally only handles session expired event but there are
> many
> > > bad
> > > > > > > situations of zookeepers where Kafka can go haywire.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Thu, Mar 20, 2014 at 9:23 PM, Neha Narkhede <
> > > > > neha.narkh...@gmail.com
> > > > > > > >wrote:
> > > > > > >
> > > > > > > > 1.  What does 'zookeeper state changed (Expired)' mean?
> > > > > > > >
> > > > > > > > 2.  Has anyone seen issues like this before?  Where zookeeper
> > > > > > connections
> > > > > > > > are flaky enough to cause leader elections?
> > > > > > > >
> > > > > > > > It means zookeeper expired the session. The most common
> reason
> > > for
> > > > > this
> > > > > > > is
> > > > > > > > client side GC (in your case, client is the Kafka broker)
> > though
> > > if
> > > > > > your
> > > > > > > > zookeeper setup is flaky, this might even point to a server
> > side
> > > > I/O
> > > > > > > > bottleneck or GC. Could you inspect the broker and
> zookeeper's
> > gc
> > > > > logs?
> > > > > > > >
> > > > > > > >
> > > > > > > > On Thu, Mar 20, 2014 at 3:26 PM, Andrew Otto <
> > o...@wikimedia.org
> > > >
> > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hm, just saw something a little fishy.
> > > > > > > > >
> > > > > > > > > (In the following logs, analytics1021 (id 21) and
> > analytics1022
> > > > (id
> > > > > > 22)
> > > > > > > > > are Brokers and analytics1023,analytics1024,analytics1025
> are
> > > > > > > > Zookeepers.)
> > > > > > > > >
> > > > > > > > > At 2014-03-20 21:12:26, analytics1021 lost its connection
> to
> > > > > > zookeeper.
> > > > > > > > It
> > > > > > > > > reconnected to analytics1023, but as it was doing so, it
> > > logged a
> > > > > > > > > 'zookeeper state changed (Expired)' message, which
> apparently
> > > > > caused
> > > > > > > the
> > > > > > > > > newly opened socket to close again.  Am I reading that
> right?
> > > > > > > > >
> > > > > > > > > In the time that it took analytics1021 to finally
> > re-establish
> > > a
> > > > > > > > zookeeper
> > > > > > > > > connection, analytics1022 had become the leader for all
> > > > partitions.
> > > > > >  I
> > > > > > > > can
> > > > > > > > > rebalance the leaders, but I'd prefer if this didn't happen
> > in
> > > > the
> > > > > > > first
> > > > > > > > > place.
> > > > > > > > >
> > > > > > > > > 1.  What does 'zookeeper state changed (Expired)' mean?
> > > > > > > > > 2.  Has anyone seen issues like this before?  Where
> zookeeper
> > > > > > > connections
> > > > > > > > > are flaky enough to cause leader elections?
> > > > > > > > >
> > > > > > > > > Thanks!
> > > > > > > > > -Andrew Otto
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > [2014-03-20 21:12:26,427] 685698013
> > > > > > > > > [main-SendThread(analytics1025.eqiad.wmnet:2181)] INFO
> > > > > > > > > org.apache.zookeeper.ClientCnxn - Client session timed out,
> > > have
> > > > > not
> > > > > > > > heard
> > > > > > > > > from server in 10067ms for sessionid 0x1425d903a3f1f89,
> > closing
> > > > > > socket
> > > > > > > > > connection and attempting reconnect
> > > > > > > > > [2014-03-20 21:12:26,528] 685698114 [main-EventThread] INFO
> > > > > > > > > org.I0Itec.zkclient.ZkClient - zookeeper state changed
> > > > > (Disconnected)
> > > > > > > > > [2014-03-20 21:12:27,252] 685698838
> > > > > > > > > [main-SendThread(analytics1025.eqiad.wmnet:2181)] INFO
> > > > > > > > > org.apache.zookeeper.ClientCnxn - Opening socket connection
> > to
> > > > > server
> > > > > > > > > analytics1023.eqiad.wmnet/10.64.5.15:2181
> > > > > > > > > [2014-03-20 21:12:27,271] 685698857
> > > > > > > > > [main-SendThread(analytics1023.eqiad.wmnet:2181)] INFO
> > > > > > > > > org.apache.zookeeper.ClientCnxn - Socket connection
> > established
> > > > to
> > > > > > > > > analytics1023.eqiad.wmnet/10.64.5.15:2181, initiating
> > session
> > > > > > > > > [2014-03-20 21:12:27,273] 685698859 [main-EventThread] INFO
> > > > > > > > > org.I0Itec.zkclient.ZkClient - zookeeper state changed
> > > (Expired)
> > > > > > > > > [2014-03-20 21:12:27,273] 685698859
> > > > > > > > > [main-SendThread(analytics1023.eqiad.wmnet:2181)] INFO
> > > > > > > > > org.apache.zookeeper.ClientCnxn - Unable to reconnect to
> > > > ZooKeeper
> > > > > > > > service,
> > > > > > > > > session 0x1425d903a3f1f89 has expired, closing socket
> > > connection
> > > > > > > > > [2014-03-20 21:12:27,273] 685698859 [main-EventThread] INFO
> > > > > > > > > org.apache.zookeeper.ZooKeeper - Initiating client
> > connection,
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> connectString=analytics1023.eqiad.wmnet,analytics1024.eqiad.wmnet,analytics1025.eqiad.wmnet/kafka/eqiad
> > > > > > > > > sessionTimeout=6000
> > > watcher=org.I0Itec.zkclient.ZkClient@63203b59
> > > > > > > > > [2014-03-20 21:12:27,290] 685698876
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> [ZkClient-EventThread-24-analytics1023.eqiad.wmnet,analytics1024.eqiad.wmnet,analytics1025.eqiad.wmnet/kafka/eqiad]
> > > > > > > > > INFO kafka.server.KafkaZooKeeper - re-registering broker
> info
> > > in
> > > > ZK
> > > > > > for
> > > > > > > > > broker 21
> > > > > > > > > [2014-03-20 21:12:27,290] 685698876 [main-EventThread] INFO
> > > > > > > > > org.apache.zookeeper.ClientCnxn - EventThread shut down
> > > > > > > > > [2014-03-20 21:12:27,290] 685698876 [main-SendThread()]
> INFO
> > > > > > > > > org.apache.zookeeper.ClientCnxn - Opening socket connection
> > to
> > > > > server
> > > > > > > > > analytics1023.eqiad.wmnet/10.64.5.15:2181
> > > > > > > > > [2014-03-20 21:12:27,292] 685698878
> > > > > > > > > [main-SendThread(analytics1023.eqiad.wmnet:2181)] INFO
> > > > > > > > > org.apache.zookeeper.ClientCnxn - Socket connection
> > established
> > > > to
> > > > > > > > > analytics1023.eqiad.wmnet/10.64.5.15:2181, initiating
> > session
> > > > > > > > > [2014-03-20 21:12:27,313] 685698899
> > > > > > > > > [main-SendThread(analytics1023.eqiad.wmnet:2181)] INFO
> > > > > > > > > org.apache.zookeeper.ClientCnxn - Session establishment
> > > complete
> > > > on
> > > > > > > > server
> > > > > > > > > analytics1023.eqiad.wmnet/10.64.5.15:2181, sessionid =
> > > > > > > > > 0xff429b4201dd2352, negotiated timeout = 6000
> > > > > > > > > [2014-03-20 21:12:27,314] 685698900 [main-EventThread] INFO
> > > > > > > > > org.I0Itec.zkclient.ZkClient - zookeeper state changed
> > > > > > (SyncConnected)
> > > > > > > > > [2014-03-20 21:12:27,324] 685698910
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> [ZkClient-EventThread-24-analytics1023.eqiad.wmnet,analytics1024.eqiad.wmnet,analytics1025.eqiad.wmnet/kafka/eqiad]
> > > > > > > > > INFO kafka.server.KafkaZooKeeper - done re-registering
> broker
> > > > > > > > > [2014-03-20 21:12:27,324] 685698910
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> [ZkClient-EventThread-24-analytics1023.eqiad.wmnet,analytics1024.eqiad.wmnet,analytics1025.eqiad.wmnet/kafka/eqiad]
> > > > > > > > > INFO kafka.server.KafkaZooKeeper - Subscribing to
> > > /brokers/topics
> > > > > > path
> > > > > > > to
> > > > > > > > > watch for new topics
> > > > > > > > > [2014-03-20 21:12:27,370] 685698956
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> [ZkClient-EventThread-24-analytics1023.eqiad.wmnet,analytics1024.eqiad.wmnet,analytics1025.eqiad.wmnet/kafka/eqiad]
> > > > > > > > > INFO
> > kafka.server.ZookeeperLeaderElector$LeaderChangeListener -
> > > > New
> > > > > > > > leader
> > > > > > > > > is 22
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to