RE: Problem with node after restart no partitions?

Thunder Stumpges Mon, 06 Apr 2015 16:31:34 -0700

Hi all,

I have collected our logs and paired them down. Jiangjie, I sent you more 
complete logs directly to your mailto from the list (minus the .INVALID). 
Please let me know if you did not get them. I cannot post them here in public.


For others, and to continue the conversation, during the initial shutdown we 
failed a clean shutdown due to one corrupt topic. The message in server.log was:

[2015-03-31 10:21:46,250] INFO [Kafka Server 6], Remaining partitions to move: 
[__samza_checkpoint_ver_1_for_usersessions_1,0] (kafka.server.KafkaServer)
[2015-03-31 10:21:46,250] INFO [Kafka Server 6], Error code from controller: 0 
(kafka.server.KafkaServer)

And related message in state-change.log:

[2015-03-31 10:21:42,622] TRACE Controller 6 epoch 23 started leader election 
for partition [__samza_checkpoint_ver_1_for_usersessions_1,0] 
(state.change.logger)
[2015-03-31 10:21:42,623] ERROR Controller 6 epoch 23 encountered error while 
electing leader for partition [__samza_checkpoint_ver_1_for_usersessions_1,0] 
due to: LeaderAndIsr information doesn't exist for partition 
[__samza_checkpoint_ver_1_for_usersessions_1,0] in OnlinePartition state. 
(state.change.logger)
[2015-03-31 10:21:42,623] TRACE Controller 6 epoch 23 received response 
correlationId 2360 for a request sent to broker id:8,host:xxxxxxx,port:9092 
(state.change.logger) 
[2015-03-31 10:21:42,623] ERROR Controller 6 epoch 23 initiated state change 
for partition [__samza_checkpoint_ver_1_for_usersessions_1,0] from 
OnlinePartition to OnlinePartition failed (state.change.logger)
kafka.common.StateChangeFailedException: encountered error while electing 
leader for partition [__samza_checkpoint_ver_1_for_usersessions_1,0] due to: 
LeaderAndIsr information doesn't exist for partition 
[__samza_checkpoint_ver_1_for_usersessions_1,0] in OnlinePartition state.
        at 
kafka.controller.PartitionStateMachine.electLeaderForPartition(PartitionStateMachine.scala:360)
        at 
kafka.controller.PartitionStateMachine.kafka$controller$PartitionStateMachine$$handleStateChange(PartitionStateMachine.scala:187)
        at 
kafka.controller.PartitionStateMachine$$anonfun$handleStateChanges$2.apply(PartitionStateMachine.scala:125)
        at 
kafka.controller.PartitionStateMachine$$anonfun$handleStateChanges$2.apply(PartitionStateMachine.scala:124)
        at scala.collection.immutable.Set$Set1.foreach(Set.scala:86)
        at 
kafka.controller.PartitionStateMachine.handleStateChanges(PartitionStateMachine.scala:124)
        at 
kafka.controller.KafkaController$$anonfun$shutdownBroker$3$$anonfun$apply$1$$anonfun$apply$mcV$sp$3.apply(KafkaController.scala:257)
        at 
kafka.controller.KafkaController$$anonfun$shutdownBroker$3$$anonfun$apply$1$$anonfun$apply$mcV$sp$3.apply(KafkaController.scala:253)
        at scala.Option.foreach(Option.scala:197)
        at 
kafka.controller.KafkaController$$anonfun$shutdownBroker$3$$anonfun$apply$1.apply$mcV$sp(KafkaController.scala:253)
        at 
kafka.controller.KafkaController$$anonfun$shutdownBroker$3$$anonfun$apply$1.apply(KafkaController.scala:253)
        at 
kafka.controller.KafkaController$$anonfun$shutdownBroker$3$$anonfun$apply$1.apply(KafkaController.scala:253)
        at kafka.utils.Utils$.inLock(Utils.scala:538)
        at 
kafka.controller.KafkaController$$anonfun$shutdownBroker$3.apply(KafkaController.scala:252)
        at 
kafka.controller.KafkaController$$anonfun$shutdownBroker$3.apply(KafkaController.scala:249)
        at 
scala.collection.immutable.HashSet$HashSet1.foreach(HashSet.scala:130)
        at 
scala.collection.immutable.HashSet$HashTrieSet.foreach(HashSet.scala:275)
        at 
scala.collection.immutable.HashSet$HashTrieSet.foreach(HashSet.scala:275)
        at 
kafka.controller.KafkaController.shutdownBroker(KafkaController.scala:249)
        at 
kafka.server.KafkaApis.handleControlledShutdownRequest(KafkaApis.scala:264)
        at kafka.server.KafkaApis.handle(KafkaApis.scala:192)
        at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:42)
        at java.lang.Thread.run(Thread.java:744)
Caused by: kafka.common.StateChangeFailedException: LeaderAndIsr information 
doesn't exist for partition [__samza_checkpoint_ver_1_for_usersessions_1,0] in 
OnlinePartition state
        at 
kafka.controller.PartitionStateMachine.getLeaderIsrAndEpochOrThrowException(PartitionStateMachine.scala:391)
        at 
kafka.controller.PartitionStateMachine.electLeaderForPartition(PartitionStateMachine.scala:324)
        ... 22 more


Then when starting up, we got many angry messages like these:

[2015-03-31 10:24:10,960] WARN [Replica Manager on Broker 6]: Fetch request 
with correlation id 159883938 from client ReplicaFetcherThread-0-6 on partition 
[__samza_checkpoint_ver_1_for_usersessions_1,0] failed due to Partition 
[__samza_checkpoint_ver_1_for_usersessions_1,0] doesn't exist on 6 
(kafka.server.ReplicaManager) [2015-03-31 10:24:10,989] WARN [Replica Manager 
on Broker 6]: While recording the follower position, the partition 
[__samza_checkpoint_ver_1_for_usersessions_1,0] hasn't been created, skip 
updating leader HW (kafka.server.ReplicaManager)
[2015-03-31 10:24:10,989] WARN [Replica Manager on Broker 6]: While recording 
the follower position, the partition 
[__samza_checkpoint_ver_1_for_usersessions_1,0] hasn't been created, skip 
updating leader HW (kafka.server.ReplicaManager)

Then they change to these:

[2015-03-31 10:24:12,008] WARN [Replica Manager on Broker 6]: Fetch request 
with correlation id 159884643 from client ReplicaFetcherThread-0-6 on partition 
[__samza_checkpoint_ver_1_for_usersessions_1,0] failed due to Partition 
[__samza_checkpoint_ver_1_for_usersessions_1,0] doesn't exist on 6 
(kafka.server.ReplicaManager)
[2015-03-31 10:24:12,009] WARN [Replica Manager on Broker 6]: While recording 
the follower position, the partition 
[__samza_checkpoint_ver_1_for_usersessions_1,0] hasn't been created, skip 
updating leader HW (kafka.server.ReplicaManager)
[2015-03-31 10:24:12,009] WARN [Replica Manager on Broker 6]: Fetch request 
with correlation id 158861852 from client ReplicaFetcherThread-0-6 on partition 
[__samza_checkpoint_ver_1_for_usersessions_1,0] failed due to Partition 
[__samza_checkpoint_ver_1_for_usersessions_1,0] doesn't exist on 6 
(kafka.server.ReplicaManager)
[2015-03-31 10:24:12,009] WARN [Replica Manager on Broker 6]: While recording 
the follower position, the partition 
[__samza_checkpoint_ver_1_for_usersessions_1,0] hasn't been created, skip 
updating leader HW (kafka.server.ReplicaManager)

And a bunch like this:


[2015-03-31 10:24:12,019] ERROR [KafkaApi-6] error when handling request Name: 
FetchRequest; Version: 0; CorrelationId: 158861855; ClientId: 
ReplicaFetcherThread-0-6; ReplicaId: 8; MaxWait: 500 ms; MinBytes: 1 bytes; 
RequestInfo: [__samza_checkpoint_ver_1_for_usersessions_1,0] -> 
PartitionFetchInfo(4461,1048576) (kafka.server.KafkaApis)
kafka.common.NotAssignedReplicaException: Leader 6 failed to record follower 
8's position -1 since the replica is not recognized to be one of the assigned 
replicas  for partition [__samza_checkpoint_ver_1_for_usersessions_1,0]
        at 
kafka.server.ReplicaManager.updateReplicaLEOAndPartitionHW(ReplicaManager.scala:574)
        at 
kafka.server.KafkaApis$$anonfun$recordFollowerLogEndOffsets$2.apply(KafkaApis.scala:388)
        at 
kafka.server.KafkaApis$$anonfun$recordFollowerLogEndOffsets$2.apply(KafkaApis.scala:386)
        at 
scala.collection.immutable.MapLike$$anon$2$$anonfun$foreach$3.apply(MapLike.scala:109)
        at 
scala.collection.immutable.MapLike$$anon$2$$anonfun$foreach$3.apply(MapLike.scala:109)
        at scala.collection.immutable.Map$Map1.foreach(Map.scala:119)
        at scala.collection.immutable.MapLike$$anon$2.foreach(MapLike.scala:109)
        at 
kafka.server.KafkaApis.recordFollowerLogEndOffsets(KafkaApis.scala:386)
        at kafka.server.KafkaApis.handleFetchRequest(KafkaApis.scala:351)
        at kafka.server.KafkaApis.handle(KafkaApis.scala:60)
        at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:59)
        at java.lang.Thread.run(Thread.java:744)


And it starts truncating the logs (not to 0, guessing as part of becoming 
follower?) ...
[2015-03-31 10:24:15,897] INFO [ReplicaFetcherManager on broker 6] Removed 
fetcher for partitions 
[my-store-changelog,1],[stage_vsw.avrodto.addelivery.internalstats.trafficshaperlog,3],+
 a bunch of partitions... (kafka.server.ReplicaFetcherManager)
[2015-03-31 10:24:16,013] INFO Truncating log my-store-changelog-1 to offset 
3736. (kafka.log.Log)
[2015-03-31 10:24:16,014] INFO Truncating log xyz.topic1-3 to offset 2930. 
(kafka.log.Log)
 [2015-03-31 10:24:16,014] INFO Truncating log xyz.topic2-2 to offset 10176. 
(kafka.log.Log)

This continues for many / all partitions, then... It starts deleting logs!?

[2015-03-31 10:24:38,449] INFO [ReplicaFetcherManager on broker 6] Removed 
fetcher for partitions [my-store-changelog,1],[xyz.topic1,1],... 
(kafka.server.ReplicaFetcherManager)
[2015-03-31 10:24:40,700] INFO Scheduling log segment 6587 for log xyz.topic1-2 
for deletion. (kafka.log.Log)
[2015-03-31 10:24:40,701] INFO Scheduling log segment 6594 for log xyz.topic1-2 
for deletion. (kafka.log.Log)
[2015-03-31 10:24:40,701] INFO Scheduling log segment 6595 for log xyz.topic1-2 
for deletion. (kafka.log.Log)
[2015-03-31 10:24:40,702] INFO Scheduling log segment 203953 for log 
xyz.topic2-4 for deletion. (kafka.log.Log)
[2015-03-31 10:24:40,702] INFO Scheduling log segment 210571 for log 
xyz.topic2-4 for deletion. (kafka.log.Log)
[2015-03-31 10:24:40,702] INFO Scheduling log segment 211471 for log 
xyz.topic2-4 for deletion. (kafka.log.Log)

then it starts actually deleting them... this goes on for a good 20 minutes...
 [2015-03-31 10:25:40,704] INFO Deleting segment 6587 from log xyz.topic1-2. 
(kafka.log.Log)
[2015-03-31 10:25:40,716] INFO Deleting index 
/data4/kafka-data/xyz.topic1-2/00000000000000006587.index.deleted 
(kafka.log.OffsetIndex)
[2015-03-31 10:25:40,716] INFO Deleting segment 6594 from log xyz.topic1-2. 
(kafka.log.Log)
[2015-03-31 10:25:40,717] INFO Deleting index 
/data4/kafka-data/xyz.topic1-2/00000000000000006594.index.deleted 
(kafka.log.OffsetIndex)
[2015-03-31 10:25:40,717] INFO Deleting segment 6595 from log xyz.topic1-2. 
(kafka.log.Log)
[2015-03-31 10:25:40,717] INFO Deleting index 
/data4/kafka-data/xyz.topic1-2/00000000000000006595.index.deleted 
(kafka.log.OffsetIndex)
[2015-03-31 10:25:40,717] INFO Deleting segment 203953 from log xyz.topic2-4. 
(kafka.log.Log)
[2015-03-31 10:25:40,722] INFO Deleting segment 210571 from log xyz.topic2-4. 
(kafka.log.Log)
[2015-03-31 10:25:40,729] INFO Deleting index 
/data4/kafka-data/xyz.topic2-4/00000000000000203953.index.deleted 
(kafka.log.OffsetIndex)
[2015-03-31 10:25:40,729] INFO Deleting segment 211471 from log xyz.topic2-4. 
(kafka.log.Log)

I don't know that we knew what was happening exactly at this time, only that it 
was not sync'ing up with the others. I think the sys-engineer stopped it after 
about 20 minutes to see what was wrong... I think by this point the damage was 
done. And actually in these logs, I don't see any of the "No checkpointed 
highwatermark is found for partition" messages in this sequence though, not 
sure what's up.

Any ideas? It makes us more than a little nervous to restart nodes if they will 
just "decide" to delete segments. Under what conditions would this happen?

Thanks!
Thunder


-----Original Message-----
From: Thunder Stumpges [mailto:tstump...@ntent.com] 
Sent: Friday, April 03, 2015 12:10 PM
To: users@kafka.apache.org
Subject: RE: Problem with node after restart no partitions?

Likewise, I was not "at the wheel" when this was happening, and there very well 
could have been a similar situation of not waiting for a controlled shutdown to 
complete successfully.

Fortunately we did not end up in exactly your situation where the entire 
cluster went down, but I can say I know we never had more than 1 of the three 
nodes shut down during this situation, and twice in two days we lost all data 
on the node and it had to re-sync ALL of its data over again.

I am in the process of trying to pull the controller and server logs for the 
first portion of our issue right now. Will follow up when they are available.

Thanks,
Thunder


-----Original Message-----
From: Jason Rosenberg [mailto:j...@squareup.com]
Sent: Friday, April 03, 2015 10:50 AM
To: users@kafka.apache.org
Subject: Re: Problem with node after restart no partitions?

I will provide what I can (we don't have separate logs for controller, etc., 
it's all integrated in a single log with log4j for us, we embed Kafka in a java 
container, etc).  Are there specific log classes you'd be interested in seeing? 
 (I can look at the default log4j configs to see what's set up normally for the 
'controller' log)....

We have most of the logs saved away (but not all of them).

Sorry, I didn't mean to hijack this thread (so will start a new thread 
soonly).....

Jason

On Fri, Apr 3, 2015 at 1:37 PM, Jiangjie Qin <j...@linkedin.com.invalid>
wrote:

> This sounds a very serious issueŠ Could you provide the controller log 
> and the log for the first broker on which you tried controlled 
> shutdown and upgrade?
>
> On 4/3/15, 8:57 AM, "Jason Rosenberg" <j...@squareup.com> wrote:
>
> >I'm preparing a longer post here, but we recently ran into a similar 
> >scenario.  Not sure yet if it's the same thing you saw (but it feels 
> >similar).  We were also doing a rolling upgrade from 0.8.1.1 to 
> >0.8.2.1, and during the controlled shutdown of the first node (of a 4 
> >node cluster), the controlled shutdown was taking longer than normal 
> >(it timed out several times and was retrying controlled shutdown), 
> >and unfortunately, our deployment system decided to kill it hard (so 
> >it was in the middle of it's 4th controlled shutdown retry, etc.).
> >
> >Anyway, when the node came back, it naturally decided to 'restore' 
> >most of it's partitions, which took some time (but only like 5 
> >minutes).  What's weird is it didn't decide to resync data from other 
> >replicas, instead it just restored partitions locally.  During this 
> >time, the rest of the cluster failed to elect any new leaders, and so 
> >for 5 minutes, those partitions were unavailable (and we saw a flood 
> >of failed FetcherManager exceptions from the other nodes in the 
> >cluster).  Most of the partitions were empty (e.g. there's no way the 
> >other replicas were behind and not in the ISR normally).  During this
> >5 minutes, producers were unable to send messages due to 
> >NotLeaderForPartition exceptions.  Apparently the controller was still 
> >sending them to the unavailable broker.
> >
> >Finally, when the first node finally came up, the other nodes were 
> >somewhat happy again (but a few partitions remained under-replicated 
> >indefinitely).
> >Because of this, we decided to pause the rolling restart, and try to 
> >wait for the under-replicated partitions to get insync.
> >Unfortunately, about an hour later, the whole cluster went foobar 
> >(e.g. partitions became unavailable, brokers logged a flood of 
> >Fetcher errors, producers couldn't find a valid leader, metadata 
> >requests timed out, etc.).  In a panic, we reverted that first node 
> >back to 0.8.1.1. This did not help, unfortunately, so, deciding we'd 
> >already probably lost data at this point (and producers could not 
> >send data due to (NotLeaderForPartition exceptions)), we decided to 
> >just forcibly do the upgrade to 0.8.2.1.  This was all a bad 
> >situation, of course.
> >
> >So, now we have the cluster stable at 0.8.2.1, but like you, we are 
> >very, very nervous about doing any kind of restart to any of our 
> >nodes.  We lost data, primarily in the form of producers failing to 
> >send during the periods of unavailability.
> >
> >It looks like the root cause, in our case, was a flood of topics 
> >created (long-since unused and empty).  This appears to have caused 
> >the longer than normal controlled shutdown, which in turn, led to the 
> >followon problems.
> >However, in the past, we've seen a controlled shutdown failure result 
> >in an unclean shutdown, but usually the cluster recovers (e.g. it 
> >elects new leaders, and when the new node comes back, it recovers 
> >it's partitions that were uncleanly shutdown).  That did not happen 
> >this time (the rest of the cluster got in an apparent infinite loop 
> >where it tried repeatedly (e.g.
> >500K times a minute) to fetch partitions that were unavailable).
> >
> >I'm preparing a longer post with more detail (will take a bit of time).
> >
> >Jason
> >
> >On Thu, Apr 2, 2015 at 10:19 PM, Gwen Shapira <gshap...@cloudera.com>
> >wrote:
> >
> >> wow, thats scary for sure.
> >>
> >> Just to be clear - all you did is restart *one* broker in the cluster?
> >> everything else was ok before the restart? and that was controlled 
> >> shutdown?
> >>
> >> Gwen
> >>
> >> On Wed, Apr 1, 2015 at 11:54 AM, Thunder Stumpges 
> >> <tstump...@ntent.com>
> >> wrote:
> >>
> >> > Well it appears we lost all the data on the one node again. It
> >>appears to
> >> > be all or part of KAFKA-1647<
> >> > https://issues.apache.org/jira/browse/KAFKA-1647> as we saw this 
> >> > in
> >>our
> >> > logs (for all topics):
> >> >
> >> > [2015-04-01 10:46:58,901] WARN Partition [logactivity-redirect,3] 
> >> > on broker 6: No checkpointed highwatermark is found for partition 
> >> > [logactivity-redirect,3] (kafka.cluster.Partition)
> >> > [2015-04-01 10:46:58,902] WARN Partition [pageimpression,1] on 
> >> > broker
> >>6:
> >> > No checkpointed highwatermark is found for partition
> >>[pageimpression,1]
> >> > (kafka.cluster.Partition)
> >> > [2015-04-01 10:46:58,904] WARN Partition 
> >> > [campaignplatformtarget,6] on broker 6: No checkpointed 
> >> > highwatermark is found for partition [campaignplatformtarget,6]
> >> > (kafka.cluster.Partition)
> >> > [2015-04-01 10:46:58,905] WARN Partition [trackingtags-c2,1] on
> >>broker 6:
> >> > No checkpointed highwatermark is found for partition
> >>[trackingtags-c2,1]
> >> > (kafka.cluster.Partition)
> >> >
> >> > Followed by:
> >> >
> >> > [2015-04-01 10:46:58,911] INFO Truncating log trafficshaperlog-3 
> >> > to
> >> offset
> >> > 0. (kafka.log.Log)
> >> > [2015-04-01 10:46:58,928] INFO Truncating log videorecrequest-0 
> >> > to
> >>offset
> >> > 0. (kafka.log.Log)
> >> > [2015-04-01 10:46:58,928] INFO Truncating log filteredredirect-2 
> >> > to
> >> offset
> >> > 0. (kafka.log.Log)
> >> > [2015-04-01 10:46:58,985] INFO Truncating log precheckrequest-3 
> >> > to
> >>offset
> >> > 0. (kafka.log.Log)
> >> > [2015-04-01 10:46:58,990] INFO Truncating log filteredclicklog-8 
> >> > to
> >> offset
> >> > 0. (kafka.log.Log)
> >> >
> >> > Followed by:
> >> > [2015-04-01 10:46:59,107] INFO Scheduling log segment 90276704 
> >> > for log
> >> > pageview-0 for deletion. (kafka.log.Log)
> >> > [2015-04-01 10:46:59,107] INFO Scheduling log segment 90593329 
> >> > for log
> >> > pageview-0 for deletion. (kafka.log.Log)
> >> > [2015-04-01 10:46:59,107] INFO Scheduling log segment 90906048 
> >> > for log
> >> > pageview-0 for deletion. (kafka.log.Log)
> >> > [2015-04-01 10:46:59,107] INFO Scheduling log segment 91226773 
> >> > for log
> >> > pageview-0 for deletion. (kafka.log.Log)
> >> >
> >> >
> >> > The strange thing however is that I don't believe we ever had the
> >> scenario
> >> > mentioned in the bug (all brokers for a topic down) As we have 0
> >>unclean
> >> > leader elections, our applications never complained the cluster 
> >> > was
> >>down,
> >> > and we never actually stopped more than one node (the node in
> >>question).
> >> >
> >> > Anyway, really unsure if this could have somehow been related to 
> >> > the attempted upgrade to 0.8.2.1 (the occurrence this morning was 
> >> > NOT, it
> >> was a
> >> > simple stop, restart broker)
> >> >
> >> > We are really nervous to do any kafka service restarts now that 
> >> > this
> >>has
> >> > happened twice on this machine.
> >> > Any suggestions? Should we go back and make another attempt to
> >>upgrade to
> >> > 0.8.2.1 ?
> >> >
> >> > Thanks,
> >> > Thunder
> >> >
> >> >
> >> > From: Thunder Stumpges
> >> > Sent: Wednesday, April 01, 2015 10:30 AM
> >> > To: kafka-us...@incubator.apache.org
> >> > Subject: Problem with node after restart no partitions?
> >> >
> >> > Hi group,
> >> >
> >> > We've been trying to track down a problem this morning for a 
> >> > little
> >> while,
> >> > and thought I'd ask here while we keep looking.
> >> >
> >> > We have 3 nodes (rep-3) running 8.1.1. We attempted a rolling 
> >> > upgrade yesterday to 8.2.1, and on the first node, after 
> >> > restarting, a single
> >> topic
> >> > (a samza intermediate topic) started throwing replica fetcher 
> >> > errors
> >>over
> >> > and over ("NotLeaderForPartition"). There may or may not have 
> >> > been
> >>other
> >> > things attempted at this time (not by me so I cannot say for sure).
> >> Anyway
> >> > we ended up rolling back to 8.1.1 and ALL data had been DELETED 
> >> > from
> >>that
> >> > node. It spent most of yesterday re-syncing, and came into sync 
> >> > last
> >> night,
> >> > and a rebalance made everything run smoothly (*except for these 
> >> > damn replica fetcher errors for that one partition).
> >> >
> >> > Today my colleague attempted the "unsupported" topic delete 
> >> > command
> >>for
> >> > the "bad" partition, and bounced that one troublesome node.
> >> >
> >> > Upon coming up, I can see in server.log that it is reading in all 
> >> > of
> >>the
> >> > segments in, and then starts spitting out a samza topic fetch 
> >> > error,
> >>and
> >> > through JMX the "ReplicaManager".LeaderCount is 0. It is not
> >>attempting
> >> to
> >> > fetch or load any topics.
> >> >
> >> > The other two brokers are showing under-replicated (obviously). 
> >> > What
> >>is
> >> > going wrong? How can we get that samza topic really and truly gone?
> >>(if
> >> > that is the cause of the broker not coming up)
> >> >
> >> > Thanks,
> >> > Thunder
> >> >
> >> >
> >>
>
>

RE: Problem with node after restart no partitions?

Reply via email to