Ah! That makes so much sense, and is likely the cause of our lost messages.
Thanks, I am now experimenting with other ack values. I’m a little worried about latency, especially since some of our producers send traffic across the Atlantic (we don’t use MirrorMaker :/ ). On Oct 1, 2014, at 10:14 AM, Neha Narkhede <neha.narkh...@gmail.com> wrote: > But, as long as broker with the unreplicated messages comes back into the > ISR, those messages should be replicated, no? > > Or, would the fact that the a new leader has been chosen for a partition, > and new messages have been sent to that new leader, cause the offending > broker to drop the unreplicated messages when it comes back into the ISR? > > The moment the leader moves to another broker, the partition's source of > truth is the new broker's log and other followers truncate their logs to > follow the new leader. So, any unreplicated messages that didn't reach the > new leader are lost. If the old leader rejoins ISR, it will also truncate > it's log to follow the new leader's log. > > Thanks, > Neha > > On Wed, Oct 1, 2014 at 5:48 AM, Andrew Otto <ao...@wikimedia.org> wrote: > >> I understand that, but even if the leader quickly (within a few seconds) >> rejoins the cluster? I had thought that ack=1 meant that messages not yet >> replicated from a broker that has a serious fatal failure (disk corruption, >> etc.) would be lost forever. But, as long as broker with the unreplicated >> messages comes back into the ISR, those messages should be replicated, no? >> >> Or, would the fact that the a new leader has been chosen for a partition, >> and new messages have been sent to that new leader, cause the offending >> broker to drop the unreplicated messages when it comes back into the ISR? >> >> >> On Sep 30, 2014, at 7:17 PM, Jun Rao <jun...@gmail.com> wrote: >> >>> With ack=1, acked messages could be lost when the leader fails. >>> >>> Thanks, >>> >>> Jun >>> >>> On Mon, Sep 29, 2014 at 8:04 AM, Andrew Otto <o...@wikimedia.org> wrote: >>> >>>> This happened again to me this weekend. I've done some sleuthing, and >>>> definitely can see some crazy paging stats when this lock up happens. >> For >>>> the curious, more info can be found here: >>>> https://bugzilla.wikimedia.org/show_bug.cgi?id=69667. I had tuned >>>> dirty_expire_centisecs from 30 seconds to 10, but this still happened >>>> again. I'll continue to troubleshoot and tune. This slow going >> because it >>>> is not regularly reproducible. I have to make a single change and then >>>> wait a week or two for the timeout to occur. >>>> >>>> Here's a related question. When the timeout happens, we lose some >>>> messages. Our producer is varnishkafka, which uses the librdkafka >> producer >>>> client. librdkafka keeps track of produce errors. We >>>> have kafka.topic.request.required.acks = 1. According to librdkafka, >> all >>>> messages sent have been ACKed by the leader of the partition to which >> the >>>> messages are sent. Also, when we lose messages due to this timeout, the >>>> broker that times out is always the controller. When it attempts to >>>> reconnect to Zookeeper, we see: >>>> >>>> INFO kafka.utils.ZkUtils$ - conflict in /controller data: >>>> {"version":1,"brokerid":21,"timestamp":"1411879981756"} stored data: >>>> {"version":1,"brokerid":22,"timestamp":"1407187809296"} >>>> >>>> In the case when a controller drops out of the ISR for a few seconds, >> is it >>>> possible for this confused broker to drop ACKed messages? >>>> >>>> >>>> >>>> On Thu, Jul 3, 2014 at 12:48 AM, Jun Rao <jun...@gmail.com> wrote: >>>> >>>>> Are you on Linux? We have seen this pattern (user/sys time low and real >>>>> time high in GC time) before. In our case, the problem was due to disk >>>>> I/Os. When there are lots of dirty pages (in our case, this is caused >> by >>>>> log4j logging), Linux can draft user threads (in this case GC threads) >> to >>>>> flush the dirty pages. So, all those time in real was spent on disk >> I/Os, >>>>> rather than real GCs. The fix is to tune dirty_expire_centisecs and >>>>> dirty_writeback_centisecs >>>>> to flush dirty pages more frequently to avoid such drafting. >>>>> >>>>> Thanks, >>>>> >>>>> Jun >>>>> >>>>> >>>>> On Wed, Jul 2, 2014 at 1:32 PM, Andrew Otto <o...@wikimedia.org> >> wrote: >>>>> >>>>>> Hi again! >>>>>> >>>>>> I've been having this issue consistently since I first started this >>>>> thread, >>>>>> but it was happening infrequently enough for me to brush it aside and >>>>> just >>>>>> run an election to rebalance brokers. >>>>>> >>>>>> I recently expanded (and reinstalled) our Kafka cluster so that it now >>>>> has >>>>>> 4 brokers with a default replication factor of 3 for each partition. >> I >>>>>> also switched over to the G1GC as recommended here: >>>>>> https://kafka.apache.org/081/ops.html (even though we are still >>>> running >>>>>> Kafka 0.8.0; we hope to upgrade soon). >>>>>> >>>>>> Now, only one of the 4 brokers (analytics1021, the same problem broker >>>> we >>>>>> saw before) gets its ZK connection expired even more frequently. >>>>>> Previously it was less than once a week, now I am seeing this happen >>>>>> multiple times a day. >>>>>> >>>>>> I've posted all the relevant logs from a recent event here: >>>>>> https://gist.github.com/ottomata/e42480446c627ea0af22 >>>>>> >>>>>> This includes the GC log on the offending Kafka broker during the time >>>>> this >>>>>> happened. I am pretty green when it comes to GC tuning, but I do see >>>>> this >>>>>> interesting stat: >>>>>> >>>>>> >>>>>> [Times: user=0.14 sys=0.00, real=11.47 secs] >>>>>> >>>>>> Did Kafka's JVM really just take 11.47 secs to do a GC there? I'm >>>>>> probably missing something, but I don't see which part of that real >>>>>> time summary makes up the bulk of that GC time >>>>>> >>>>>> This is strange, riight? This broker is identically configured to all >>>>>> its peers, and should be handling on average the exact same amount and >>>>>> type of traffic. Anyone have any advice? >>>>>> >>>>>> Thanks! >>>>>> -Andrew Otto >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Fri, Mar 21, 2014 at 6:48 PM, Neha Narkhede < >>>> neha.narkh...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> I see, that makes sense. Thanks a lot for clarifying! >>>>>>> >>>>>>> -Neha >>>>>>> >>>>>>> >>>>>>> On Fri, Mar 21, 2014 at 11:01 AM, Bae, Jae Hyeon <metac...@gmail.com >>>>> >>>>>>> wrote: >>>>>>> >>>>>>>> Let me clarify the situation. I forgot to mention that my case >>>> might >>>>>> not >>>>>>> be >>>>>>>> general one because Netflix is using Apache Curator as the main >>>>>> zookeeper >>>>>>>> client and ZkClient in Kafka should be bridged to Apache Curator, >>>> so >>>>>> the >>>>>>>> behavior I have seen might not be general one. >>>>>>>> >>>>>>>> Kafka's ZKSessionExpireListener.handleNewSession() is reinstating >>>> all >>>>>>>> ephemeral nodes and watchers but handleNewSession() was not kicked >>>> in >>>>>> my >>>>>>>> case. So, I created Netflix internal version of ZkClient to replace >>>>>>>> ephemeral node creation and watcher reinstating. >>>>>>>> >>>>>>>> I have a plan to remove all external dependency from Kafka soon >>>> and I >>>>>>> might >>>>>>>> be able to mention more about Kafka's zk resiliency. >>>>>>>> >>>>>>>> >>>>>>>> On Fri, Mar 21, 2014 at 7:51 AM, Jun Rao <jun...@gmail.com> wrote: >>>>>>>> >>>>>>>>> Hi, Jae, >>>>>>>>> >>>>>>>>> My understanding is that ephemeral nodes can only be lost on ZK >>>>>> session >>>>>>>>> expiration. Are you saying that there are other cases when >>>>> ephemeral >>>>>>>> nodes >>>>>>>>> can be lost? >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> >>>>>>>>> Jun >>>>>>>>> >>>>>>>>> >>>>>>>>> On Thu, Mar 20, 2014 at 9:52 PM, Bae, Jae Hyeon < >>>>> metac...@gmail.com> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> This issue is zookeeper resiliency. >>>>>>>>>> >>>>>>>>>> What I have done is, ephemeral node creation is replaced by >>>>> Apache >>>>>>>>>> Curator's PersistentEphemeralNode recipe, to reinstate >>>> ephemeral >>>>>>> nodes >>>>>>>>>> after zookeeper blip. Also, all watchers also should be >>>>> reinstated. >>>>>>>> Kafka >>>>>>>>>> internally only handles session expired event but there are >>>> many >>>>>> bad >>>>>>>>>> situations of zookeepers where Kafka can go haywire. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Thu, Mar 20, 2014 at 9:23 PM, Neha Narkhede < >>>>>>>> neha.narkh...@gmail.com >>>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> 1. What does 'zookeeper state changed (Expired)' mean? >>>>>>>>>>> >>>>>>>>>>> 2. Has anyone seen issues like this before? Where zookeeper >>>>>>>>> connections >>>>>>>>>>> are flaky enough to cause leader elections? >>>>>>>>>>> >>>>>>>>>>> It means zookeeper expired the session. The most common >>>> reason >>>>>> for >>>>>>>> this >>>>>>>>>> is >>>>>>>>>>> client side GC (in your case, client is the Kafka broker) >>>>> though >>>>>> if >>>>>>>>> your >>>>>>>>>>> zookeeper setup is flaky, this might even point to a server >>>>> side >>>>>>> I/O >>>>>>>>>>> bottleneck or GC. Could you inspect the broker and >>>> zookeeper's >>>>> gc >>>>>>>> logs? >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Thu, Mar 20, 2014 at 3:26 PM, Andrew Otto < >>>>> o...@wikimedia.org >>>>>>> >>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hm, just saw something a little fishy. >>>>>>>>>>>> >>>>>>>>>>>> (In the following logs, analytics1021 (id 21) and >>>>> analytics1022 >>>>>>> (id >>>>>>>>> 22) >>>>>>>>>>>> are Brokers and analytics1023,analytics1024,analytics1025 >>>> are >>>>>>>>>>> Zookeepers.) >>>>>>>>>>>> >>>>>>>>>>>> At 2014-03-20 21:12:26, analytics1021 lost its connection >>>> to >>>>>>>>> zookeeper. >>>>>>>>>>> It >>>>>>>>>>>> reconnected to analytics1023, but as it was doing so, it >>>>>> logged a >>>>>>>>>>>> 'zookeeper state changed (Expired)' message, which >>>> apparently >>>>>>>> caused >>>>>>>>>> the >>>>>>>>>>>> newly opened socket to close again. Am I reading that >>>> right? >>>>>>>>>>>> >>>>>>>>>>>> In the time that it took analytics1021 to finally >>>>> re-establish >>>>>> a >>>>>>>>>>> zookeeper >>>>>>>>>>>> connection, analytics1022 had become the leader for all >>>>>>> partitions. >>>>>>>>> I >>>>>>>>>>> can >>>>>>>>>>>> rebalance the leaders, but I'd prefer if this didn't happen >>>>> in >>>>>>> the >>>>>>>>>> first >>>>>>>>>>>> place. >>>>>>>>>>>> >>>>>>>>>>>> 1. What does 'zookeeper state changed (Expired)' mean? >>>>>>>>>>>> 2. Has anyone seen issues like this before? Where >>>> zookeeper >>>>>>>>>> connections >>>>>>>>>>>> are flaky enough to cause leader elections? >>>>>>>>>>>> >>>>>>>>>>>> Thanks! >>>>>>>>>>>> -Andrew Otto >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> [2014-03-20 21:12:26,427] 685698013 >>>>>>>>>>>> [main-SendThread(analytics1025.eqiad.wmnet:2181)] INFO >>>>>>>>>>>> org.apache.zookeeper.ClientCnxn - Client session timed out, >>>>>> have >>>>>>>> not >>>>>>>>>>> heard >>>>>>>>>>>> from server in 10067ms for sessionid 0x1425d903a3f1f89, >>>>> closing >>>>>>>>> socket >>>>>>>>>>>> connection and attempting reconnect >>>>>>>>>>>> [2014-03-20 21:12:26,528] 685698114 [main-EventThread] INFO >>>>>>>>>>>> org.I0Itec.zkclient.ZkClient - zookeeper state changed >>>>>>>> (Disconnected) >>>>>>>>>>>> [2014-03-20 21:12:27,252] 685698838 >>>>>>>>>>>> [main-SendThread(analytics1025.eqiad.wmnet:2181)] INFO >>>>>>>>>>>> org.apache.zookeeper.ClientCnxn - Opening socket connection >>>>> to >>>>>>>> server >>>>>>>>>>>> analytics1023.eqiad.wmnet/10.64.5.15:2181 >>>>>>>>>>>> [2014-03-20 21:12:27,271] 685698857 >>>>>>>>>>>> [main-SendThread(analytics1023.eqiad.wmnet:2181)] INFO >>>>>>>>>>>> org.apache.zookeeper.ClientCnxn - Socket connection >>>>> established >>>>>>> to >>>>>>>>>>>> analytics1023.eqiad.wmnet/10.64.5.15:2181, initiating >>>>> session >>>>>>>>>>>> [2014-03-20 21:12:27,273] 685698859 [main-EventThread] INFO >>>>>>>>>>>> org.I0Itec.zkclient.ZkClient - zookeeper state changed >>>>>> (Expired) >>>>>>>>>>>> [2014-03-20 21:12:27,273] 685698859 >>>>>>>>>>>> [main-SendThread(analytics1023.eqiad.wmnet:2181)] INFO >>>>>>>>>>>> org.apache.zookeeper.ClientCnxn - Unable to reconnect to >>>>>>> ZooKeeper >>>>>>>>>>> service, >>>>>>>>>>>> session 0x1425d903a3f1f89 has expired, closing socket >>>>>> connection >>>>>>>>>>>> [2014-03-20 21:12:27,273] 685698859 [main-EventThread] INFO >>>>>>>>>>>> org.apache.zookeeper.ZooKeeper - Initiating client >>>>> connection, >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >> connectString=analytics1023.eqiad.wmnet,analytics1024.eqiad.wmnet,analytics1025.eqiad.wmnet/kafka/eqiad >>>>>>>>>>>> sessionTimeout=6000 >>>>>> watcher=org.I0Itec.zkclient.ZkClient@63203b59 >>>>>>>>>>>> [2014-03-20 21:12:27,290] 685698876 >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >> [ZkClient-EventThread-24-analytics1023.eqiad.wmnet,analytics1024.eqiad.wmnet,analytics1025.eqiad.wmnet/kafka/eqiad] >>>>>>>>>>>> INFO kafka.server.KafkaZooKeeper - re-registering broker >>>> info >>>>>> in >>>>>>> ZK >>>>>>>>> for >>>>>>>>>>>> broker 21 >>>>>>>>>>>> [2014-03-20 21:12:27,290] 685698876 [main-EventThread] INFO >>>>>>>>>>>> org.apache.zookeeper.ClientCnxn - EventThread shut down >>>>>>>>>>>> [2014-03-20 21:12:27,290] 685698876 [main-SendThread()] >>>> INFO >>>>>>>>>>>> org.apache.zookeeper.ClientCnxn - Opening socket connection >>>>> to >>>>>>>> server >>>>>>>>>>>> analytics1023.eqiad.wmnet/10.64.5.15:2181 >>>>>>>>>>>> [2014-03-20 21:12:27,292] 685698878 >>>>>>>>>>>> [main-SendThread(analytics1023.eqiad.wmnet:2181)] INFO >>>>>>>>>>>> org.apache.zookeeper.ClientCnxn - Socket connection >>>>> established >>>>>>> to >>>>>>>>>>>> analytics1023.eqiad.wmnet/10.64.5.15:2181, initiating >>>>> session >>>>>>>>>>>> [2014-03-20 21:12:27,313] 685698899 >>>>>>>>>>>> [main-SendThread(analytics1023.eqiad.wmnet:2181)] INFO >>>>>>>>>>>> org.apache.zookeeper.ClientCnxn - Session establishment >>>>>> complete >>>>>>> on >>>>>>>>>>> server >>>>>>>>>>>> analytics1023.eqiad.wmnet/10.64.5.15:2181, sessionid = >>>>>>>>>>>> 0xff429b4201dd2352, negotiated timeout = 6000 >>>>>>>>>>>> [2014-03-20 21:12:27,314] 685698900 [main-EventThread] INFO >>>>>>>>>>>> org.I0Itec.zkclient.ZkClient - zookeeper state changed >>>>>>>>> (SyncConnected) >>>>>>>>>>>> [2014-03-20 21:12:27,324] 685698910 >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >> [ZkClient-EventThread-24-analytics1023.eqiad.wmnet,analytics1024.eqiad.wmnet,analytics1025.eqiad.wmnet/kafka/eqiad] >>>>>>>>>>>> INFO kafka.server.KafkaZooKeeper - done re-registering >>>> broker >>>>>>>>>>>> [2014-03-20 21:12:27,324] 685698910 >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >> [ZkClient-EventThread-24-analytics1023.eqiad.wmnet,analytics1024.eqiad.wmnet,analytics1025.eqiad.wmnet/kafka/eqiad] >>>>>>>>>>>> INFO kafka.server.KafkaZooKeeper - Subscribing to >>>>>> /brokers/topics >>>>>>>>> path >>>>>>>>>> to >>>>>>>>>>>> watch for new topics >>>>>>>>>>>> [2014-03-20 21:12:27,370] 685698956 >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >> [ZkClient-EventThread-24-analytics1023.eqiad.wmnet,analytics1024.eqiad.wmnet,analytics1025.eqiad.wmnet/kafka/eqiad] >>>>>>>>>>>> INFO >>>>> kafka.server.ZookeeperLeaderElector$LeaderChangeListener - >>>>>>> New >>>>>>>>>>> leader >>>>>>>>>>>> is 22 >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >> >>