I did use Boron-SR3 and managed to hit this issue after a series of
restarts. I will enable debug and file bug after getting more info.

In any case, it seems odd that we hit a NPE.

On Fri, Apr 7, 2017 at 8:22 AM, Tom Pantelis <tompante...@gmail.com> wrote:

>
>
> On Wed, Apr 5, 2017 at 3:18 PM, Srini Seetharaman <
> srini.seethara...@gmail.com> wrote:
>
>> During this time, the shard leadership was in some weird state. I am not
>> sure what "PreLeader" means. Can you clarify?
>>
>>
> PreLeader is an intermediate state prior to Leader to apply any
> uncommitted transactions on leader change prior to accepting new ones. I
> would suggest testing with Boron SR3 as it contains several changes/fixes.
> I would also suggest running with org.opendaylight.
> controller.cluster.datastore.Shard debug enabled so there's a paper trail
> in case something goes wrong.
>
>
>> -------------- instance 1 --------------
>> member-1-shard-default-config: Follower
>> member-1-shard-default-operational: Follower
>> -------------- instance2 --------------
>> member-2-shard-default-config: Follower
>> member-2-shard-default-operational: PreLeader
>> -------------- instance 3 --------------
>> member-3-shard-default-config: Leader
>> member-3-shard-default-operational: Follower
>>
>>
>> On Wed, Apr 5, 2017 at 12:17 PM, Srini Seetharaman <
>> srini.seethara...@gmail.com> wrote:
>>
>>> Here is the code blurb from boron-sr2 from that SnapshotManager.java
>>> file:
>>>
>>> 209                 //use the term of the temp-min, since we check for
>>> isPresent, entry will not be null
>>> 210                 ReplicatedLogEntry entry =
>>> context.getReplicatedLog().get(tempMin);
>>> 211                 context.getReplicatedLog().snapshotPreCommit(tempMin,
>>> entry.getTerm());
>>> 212                 context.getReplicatedLog().snapshotCommit();
>>> 213                 return tempMin;
>>> 214             }
>>>
>>>
>>> On Wed, Apr 5, 2017 at 12:15 PM, Srini Seetharaman <
>>> srini.seethara...@gmail.com> wrote:
>>>
>>>>
>>>> Hi,
>>>> During one of my runs of bring up and down the interfaces of cluster
>>>> members, I hit the following NPE after all 3 instances were isolated
>>>> twice.  Let me know if you need any more info besides the log below.
>>>>
>>>>
>>>> 2017-04-05 19:08:51,860 | WARN  | lt-dispatcher-28 |
>>>> ConcurrentDOMDataBroker          | 193 - 
>>>> org.opendaylight.controller.sal-distributed-datastore
>>>> - 1.4.2.Boron-SR2 | Tx: DOM-32 Error during phase CAN_COMMIT, starting 
>>>> Abort
>>>> org.opendaylight.controller.cluster.datastore.exceptions.NoShardLeaderException:
>>>> Shard member-2-shard-default-operational currently has no leader. Try
>>>> again later.
>>>>         at org.opendaylight.controller.cluster.datastore.shardmanager.S
>>>> hardManager.createNoShardLeaderException(ShardManager.java:7
>>>> 23)[193:org.opendaylight.controller.sal-distributed-datastor
>>>> e:1.4.2.Boron-SR2]
>>>>         at org.opendaylight.controller.cluster.datastore.shardmanager.S
>>>> hardManager.onShardNotInitializedTimeout(ShardManager.java:5
>>>> 37)[193:org.opendaylight.controller.sal-distributed-datastor
>>>> e:1.4.2.Boron-SR2]
>>>>         at org.opendaylight.controller.cluster.datastore.shardmanager.S
>>>> hardManager.handleCommand(ShardManager.java:216)[193:org.ope
>>>> ndaylight.controller.sal-distributed-datastore:1.4.2.Boron-SR2]
>>>>         at org.opendaylight.controller.cluster.common.actor.AbstractUnt
>>>> ypedPersistentActor.onReceiveCommand(AbstractUntypedPersiste
>>>> ntActor.java:29)[187:org.opendaylight.controller.sal-cluster
>>>> ing-commons:1.4.2.Boron-SR2]
>>>>         at akka.persistence.UntypedPersistentActor.onReceive(Persistent
>>>> Actor.scala:170)[181:com.typesafe.akka.persistence:2.4.7]
>>>>         at org.opendaylight.controller.cluster.common.actor.MeteringBeh
>>>> avior.apply(MeteringBehavior.java:97)[187:org.opendaylight.c
>>>> ontroller.sal-clustering-commons:1.4.2.Boron-SR2]
>>>>         at akka.actor.ActorCell$$anonfun$become$1.applyOrElse(ActorCell
>>>> .scala:544)[175:com.typesafe.akka.actor:2.4.7]
>>>>         at akka.actor.Actor$class.aroundReceive(Actor.scala:484)[175:co
>>>> m.typesafe.akka.actor:2.4.7]
>>>>         at akka.persistence.UntypedPersistentActor.akka$persistence$Eve
>>>> ntsourced$$super$aroundReceive(PersistentActor.scala:168)[18
>>>> 1:com.typesafe.akka.persistence:2.4.7]
>>>>         at akka.persistence.Eventsourced$$anon$1.stateReceive(Eventsour
>>>> ced.scala:633)[181:com.typesafe.akka.persistence:2.4.7]
>>>>         at akka.persistence.Eventsourced$class.aroundReceive(Eventsourc
>>>> ed.scala:179)[181:com.typesafe.akka.persistence:2.4.7]
>>>>         at akka.persistence.UntypedPersistentActor.aroundReceive(Persis
>>>> tentActor.scala:168)[181:com.typesafe.akka.persistence:2.4.7]
>>>>         at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)[175
>>>> :com.typesafe.akka.actor:2.4.7]
>>>>         at akka.actor.ActorCell.invoke(ActorCell.scala:495)[175:com.typ
>>>> esafe.akka.actor:2.4.7]
>>>>         at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)[175:
>>>> com.typesafe.akka.actor:2.4.7]
>>>>         at akka.dispatch.Mailbox.run(Mailbox.scala:224)[175:com.typesaf
>>>> e.akka.actor:2.4.7]
>>>>         at akka.dispatch.Mailbox.exec(Mailbox.scala:234)[175:com.typesa
>>>> fe.akka.actor:2.4.7]
>>>>         at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.j
>>>> ava:260)[171:org.scala-lang.scala-library:2.11.8.v20160304-1
>>>> 15712-1706a37eb8]
>>>>         at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(For
>>>> kJoinPool.java:1339)[171:org.scala-lang.scala-library:2.11.8
>>>> .v20160304-115712-1706a37eb8]
>>>>         at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPoo
>>>> l.java:1979)[171:org.scala-lang.scala-library:2.11.8.v201603
>>>> 04-115712-1706a37eb8]
>>>>         at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinW
>>>> orkerThread.java:107)[171:org.scala-lang.scala-library:2.11.
>>>> 8.v20160304-115712-1706a37eb8]
>>>> 2017-04-05 19:08:51,863 | WARN  | tor-ComputeTimer |
>>>> GenericTransactionUtils          | 301 - com.infinera.sdn.utils.transaction
>>>> - 0.1.0.SNAPSHOT | Transaction for add of object State [_cpuInfo=CpuInfo
>>>> [_processorCount=6, _usage=0.48, augmentation=[]], _memInfo=MemInfo 
>>>> [_memFree=138797056,
>>>> _memTotal=12302811136, augmentation=[]], _status=class
>>>> org.opendaylight.yang.gen.v1.urn.infinera.system.compute.rev160510.Running,
>>>> augmentation=[]] failed with error {}
>>>> 2017-04-05 19:09:14,056 | INFO  | lt-dispatcher-35 |
>>>> kka://opendaylight-cluster-data) | 176 - com.typesafe.akka.slf4j -
>>>> 2.4.7 | Cluster Node [akka.tcp://opendaylight-clust
>>>> er-data@172.17.0.12:2550] - Leader is moving node [akka.tcp://
>>>> opendaylight-cluster-data@172.17.0.11:2550] to [Up]
>>>> 2017-04-05 19:09:14,057 | INFO  | lt-dispatcher-35 | ShardManager
>>>>               | 193 - org.opendaylight.controller.sal-distributed-datastore
>>>> - 1.4.2.Boron-SR2 | shard-manager-operational: Received MemberUp:
>>>> memberName: MemberName{name=member-1}, address: akka.tcp:
>>>> //opendaylight-cluster-data@172.17.0.11:2550
>>>> 2017-04-05 19:09:14,057 | INFO  | lt-dispatcher-35 | ShardInformation
>>>>               | 193 - org.opendaylight.controller.sal-distributed-datastore
>>>> - 1.4.2.Boron-SR2 | updatePeerAddress for peer
>>>> member-1-shard-default-operational with address
>>>> akka.tcp://opendaylight-cluster-data@172.17.0.11:2550/user/s
>>>> hardmanager-operational/member-1-shard-default-operational
>>>> 2017-04-05 19:09:14,057 | INFO  | lt-dispatcher-35 | ShardInformation
>>>>               | 193 - org.opendaylight.controller.sal-distributed-datastore
>>>> - 1.4.2.Boron-SR2 | updatePeerAddress for peer
>>>> member-1-shard-entity-ownership-operational with address
>>>> akka.tcp://opendaylight-cluster-data@172.17.0.11:2550/user/s
>>>> hardmanager-operational/member-1-shard-entity-ownership-operational
>>>> 2017-04-05 19:09:14,058 | INFO  | lt-dispatcher-18 | ShardManager
>>>>               | 193 - org.opendaylight.controller.sal-distributed-datastore
>>>> - 1.4.2.Boron-SR2 | shard-manager-config: Received MemberUp: memberName:
>>>> MemberName{name=member-1}, address: akka.tcp://opendaylight-cluste
>>>> r-data@172.17.0.11:2550
>>>> 2017-04-05 19:09:14,058 | INFO  | lt-dispatcher-18 | ShardInformation
>>>>               | 193 - org.opendaylight.controller.sal-distributed-datastore
>>>> - 1.4.2.Boron-SR2 | updatePeerAddress for peer
>>>> member-1-shard-default-config with address akka.tcp://opendaylight-cluste
>>>> r-data@172.17.0.11:2550/user/shardmanager-config/member-1-sh
>>>> ard-default-config
>>>> 2017-04-05 19:09:14,068 | INFO  | lt-dispatcher-18 | ShardManager
>>>>               | 193 - org.opendaylight.controller.sal-distributed-datastore
>>>> - 1.4.2.Boron-SR2 | shard-manager-config: All Shards are ready - data store
>>>> config is ready, available count is 0
>>>> 2017-04-05 19:09:14,068 | INFO  | lt-dispatcher-18 | Shard
>>>>                | 188 - org.opendaylight.controller.sal-akka-raft -
>>>> 1.4.2.Boron-SR2 | Peer address for peer member-1-shard-default-config set
>>>> to akka.tcp://opendaylight-cluster-data@172.17.0.11:2550/user/s
>>>> hardmanager-config/member-1-shard-default-config
>>>> 2017-04-05 19:09:14,063 | INFO  | lt-dispatcher-28 |
>>>> EntityOwnershipShard             | 188 - 
>>>> org.opendaylight.controller.sal-akka-raft
>>>> - 1.4.2.Boron-SR2 | Peer address for peer 
>>>> member-1-shard-entity-ownership-operational
>>>> set to akka.tcp://opendaylight-cluster-data@172.17.0.11:2550/user/s
>>>> hardmanager-operational/member-1-shard-entity-ownership-operational
>>>> 2017-04-05 19:09:14,070 | INFO  | lt-dispatcher-33 | Shard
>>>>                | 188 - org.opendaylight.controller.sal-akka-raft -
>>>> 1.4.2.Boron-SR2 | Peer address for peer member-1-shard-default-operational
>>>> set to akka.tcp://opendaylight-cluster-data@172.17.0.11:2550/user/s
>>>> hardmanager-operational/member-1-shard-default-operational
>>>> 22017-04-05 19:11:31,513 | WARN  | lt-dispatcher-17 | OneForOneStrategy
>>>>                | 176 - com.typesafe.akka.slf4j - 2.4.7 | null
>>>> 2017-04-05 19:11:31,514 | WARN  | lt-dispatcher-18 | ShardManager
>>>>               | 193 - org.opendaylight.controller.sal-distributed-datastore
>>>> - 1.4.2.Boron-SR2 | Supervisor Strategy caught unexpected exception -
>>>> resuming
>>>> java.lang.NullPointerException
>>>>         at org.opendaylight.controller.cluster.raft.SnapshotManager$Abs
>>>> tractSnapshotState.doTrimLog(SnapshotManager.java:211)[188:o
>>>> rg.opendaylight.controller.sal-akka-raft:1.4.2.Boron-SR2]
>>>>         at org.opendaylight.controller.cluster.raft.SnapshotManager$Idl
>>>> e.trimLog(SnapshotManager.java:293)[188:org.opendaylight.con
>>>> troller.sal-akka-raft:1.4.2.Boron-SR2]
>>>>         at org.opendaylight.controller.cluster.raft.SnapshotManager.tri
>>>> mLog(SnapshotManager.java:91)[188:org.opendaylight.controlle
>>>> r.sal-akka-raft:1.4.2.Boron-SR2]
>>>>         at org.opendaylight.controller.cluster.raft.behaviors.AbstractR
>>>> aftActorBehavior.performSnapshotWithoutCapture(AbstractRaftA
>>>> ctorBehavior.java:470)[188:org.opendaylight.controller.sal-a
>>>> kka-raft:1.4.2.Boron-SR2]
>>>>         at org.opendaylight.controller.cluster.raft.behaviors.AbstractL
>>>> eader.purgeInMemoryLog(AbstractLeader.java:400)[188:org.open
>>>> daylight.controller.sal-akka-raft:1.4.2.Boron-SR2]
>>>>         at org.opendaylight.controller.cluster.raft.behaviors.AbstractL
>>>> eader.handleAppendEntriesReply(AbstractLeader.java:368)[188:
>>>> org.opendaylight.controller.sal-akka-raft:1.4.2.Boron-SR2]
>>>>         at org.opendaylight.controller.cluster.raft.behaviors.AbstractR
>>>> aftActorBehavior.handleMessage(AbstractRaftActorBehavior.java:404)[188:
>>>> org.opendaylight.controller.sal-akka-raft:1.4.2.Boron-SR2]
>>>>         at org.opendaylight.controller.cluster.raft.behaviors.AbstractL
>>>> eader.handleMessage(AbstractLeader.java:457)[188:org.openday
>>>> light.controller.sal-akka-raft:1.4.2.Boron-SR2]
>>>>         at org.opendaylight.controller.cluster.raft.behaviors.PreLeader
>>>> .handleMessage(PreLeader.java:49)[188:org.opendaylight.contr
>>>> oller.sal-akka-raft:1.4.2.Boron-SR2]
>>>>         at org.opendaylight.controller.cluster.raft.RaftActor.possiblyH
>>>> andleBehaviorMessage(RaftActor.java:302)[188:org.opendayligh
>>>> t.controller.sal-akka-raft:1.4.2.Boron-SR2]
>>>>         at org.opendaylight.controller.cluster.raft.RaftActor.handleCom
>>>> mand(RaftActor.java:290)[188:org.opendaylight.controller.sal
>>>> -akka-raft:1.4.2.Boron-SR2]
>>>>         at org.opendaylight.controller.cluster.common.actor.AbstractUnt
>>>> ypedPersistentActor.onReceiveCommand(AbstractUntypedPersiste
>>>> ntActor.java:29)[187:org.opendaylight.controller.sal-cluster
>>>> ing-commons:1.4.2.Boron-SR2]
>>>>         at akka.persistence.UntypedPersistentActor.onReceive(Persistent
>>>> Actor.scala:170)[181:com.typesafe.akka.persistence:2.4.7]
>>>>         at org.opendaylight.controller.cluster.common.actor.MeteringBeh
>>>> avior.apply(MeteringBehavior.java:97)[187:org.opendaylight.c
>>>> ontroller.sal-clustering-commons:1.4.2.Boron-SR2]
>>>>         at akka.actor.ActorCell$$anonfun$become$1.applyOrElse(ActorCell
>>>> .scala:544)[175:com.typesafe.akka.actor:2.4.7]
>>>>         at akka.actor.Actor$class.aroundReceive(Actor.scala:484)[175:co
>>>> m.typesafe.akka.actor:2.4.7]
>>>>         at akka.persistence.UntypedPersistentActor.akka$persistence$Eve
>>>> ntsourced$$super$aroundReceive(PersistentActor.scala:168)[18
>>>> 1:com.typesafe.akka.persistence:2.4.7]
>>>>         at akka.persistence.Eventsourced$$anon$1.stateReceive(Eventsour
>>>> ced.scala:633)[181:com.typesafe.akka.persistence:2.4.7]
>>>>         at akka.persistence.Eventsourced$class.aroundReceive(Eventsourc
>>>> ed.scala:179)[181:com.typesafe.akka.persistence:2.4.7]
>>>>         at akka.persistence.UntypedPersistentActor.aroundReceive(Persis
>>>> tentActor.scala:168)[181:com.typesafe.akka.persistence:2.4.7]
>>>>         at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)[175
>>>> :com.typesafe.akka.actor:2.4.7]
>>>>         at akka.actor.ActorCell.invoke(ActorCell.scala:495)[175:com.typ
>>>> esafe.akka.actor:2.4.7]
>>>>         at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)[175:
>>>> com.typesafe.akka.actor:2.4.7]
>>>>         at akka.dispatch.Mailbox.run(Mailbox.scala:224)[175:com.typesaf
>>>> e.akka.actor:2.4.7]
>>>>         at akka.dispatch.Mailbox.exec(Mailbox.scala:234)[175:com.typesa
>>>> fe.akka.actor:2.4.7]
>>>>         at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.j
>>>> ava:260)[171:org.scala-lang.scala-library:2.11.8.v20160304-1
>>>> 15712-1706a37eb8]
>>>>         at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(For
>>>> kJoinPool.java:1339)[171:org.scala-lang.scala-library:2.11.8
>>>> .v20160304-115712-1706a37eb8]
>>>>         at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPoo
>>>> l.java:1979)[171:org.scala-lang.scala-library:2.11.8.v201603
>>>> 04-115712-1706a37eb8]
>>>>         at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinW
>>>> orkerThread.java:107)[171:org.scala-lang.scala-library:2.11.
>>>> 8.v20160304-115712-1706a37eb8]
>>>>
>>>>
>>>
>>
>> _______________________________________________
>> controller-dev mailing list
>> controller-dev@lists.opendaylight.org
>> https://lists.opendaylight.org/mailman/listinfo/controller-dev
>>
>>
>
_______________________________________________
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev

Reply via email to