[
https://issues.apache.org/jira/browse/IGNITE-9398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16612079#comment-16612079
]
Alexey Goncharuk commented on IGNITE-9398:
------------------------------------------
Also found this exception in TC run:
{code}
[13:13:12]W: [org.apache.ignite:ignite-core] [2018-09-12
10:13:12,088][ERROR][disco-event-worker-#37%dht.IgniteCachePutRetryAtomicSelfTest0%][GridDiscoveryManager]
Runtime error caught during grid runnable execution: DiscoveryWorker
[discoCache=DiscoCache [state=DiscoveryDataClusterState [active=true,
baselineTopology=null, transitionReqId=null, transitionTopVer=null,
transitionRes=null, prevState=null, transitionError=null], loc=TcpDiscoveryNode
[id=a9dc30a7-3e75-4247-9b89-1fb54e900000, addrs=ArrayList [127.0.0.1],
sockAddrs=HashSet [/127.0.0.1:47500], discPort=47500, order=1, intOrder=1,
lastExchangeTime=1536747192016, loc=true, ver=2.7.0#20180912-sha1:10bd1c06,
isClient=false], rmtNodesWithCaches=UnmodifiableRandomAccessList [],
allCacheNodes=UnmodifiableMap {-2100569601=ArrayList [TcpDiscoveryNode
[id=a9dc30a7-3e75-4247-9b89-1fb54e900000, addrs=ArrayList [127.0.0.1],
sockAddrs=HashSet [/127.0.0.1:47500], discPort=47500, order=1, intOrder=1,
lastExchangeTime=1536747192016, loc=true, ver=2.7.0#20180912-sha1:10bd1c06,
isClient=false]]}, cacheGrpAffNodes=UnmodifiableMap {-2100569601=ArrayList
[TcpDiscoveryNode [id=a9dc30a7-3e75-4247-9b89-1fb54e900000, addrs=ArrayList
[127.0.0.1], sockAddrs=HashSet [/127.0.0.1:47500], discPort=47500, order=1,
intOrder=1, lastExchangeTime=1536747192016, loc=true,
ver=2.7.0#20180912-sha1:10bd1c06, isClient=false]]},
minNodeVer=2.7.0#20180912-sha1:10bd1c06,
minSrvNodeVer=2.7.0#20180912-sha1:10bd1c06, topVer=AffinityTopologyVersion
[topVer=1, minorTopVer=0],
aliveBaselineNodePred=o.a.i.i.managers.discovery.DiscoCache$2@a20c648,
aliveNodePred=o.a.i.i.managers.discovery.DiscoCache$3@2c434393, mvccCrd=null],
restartProcHnd=RestartProcessFailureHandler [],
stopNodeHnd=StopNodeFailureHandler [], nodeSegFired=false]
[13:13:12]W: [org.apache.ignite:ignite-core]
java.lang.NullPointerException
[13:13:12]W: [org.apache.ignite:ignite-core] at
org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryWorker.body0(GridDiscoveryManager.java:2894)
[13:13:12]W: [org.apache.ignite:ignite-core] at
org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryWorker.body(GridDiscoveryManager.java:2864)
[13:13:12]W: [org.apache.ignite:ignite-core] at
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
[13:13:12]W: [org.apache.ignite:ignite-core] at
java.lang.Thread.run(Thread.java:748)
{code}
which leads to Cache Failover 3 suite hang:
https://ci.ignite.apache.org/viewLog.html?buildId=1848678
> Reduce time on processing CustomDiscoveryMessage by discovery message worker
> ----------------------------------------------------------------------------
>
> Key: IGNITE-9398
> URL: https://issues.apache.org/jira/browse/IGNITE-9398
> Project: Ignite
> Issue Type: Bug
> Components: cache
> Affects Versions: 2.4, 2.5, 2.6
> Reporter: Pavel Kovalenko
> Assignee: Pavel Kovalenko
> Priority: Major
> Fix For: 2.7
>
>
> Processing discovery CustomMessage may take significant values of time
> (0.5-0.7 seconds) before sending to next node in the topology. This
> significantly accumulates the total time of PME if topology has multiple
> nodes.
> Let X = time of processing discovery message by discovery-msg-worker on each
> node before sending to next node.
> Let N = number of nodes in the topology.
> Then the minimal total time of exchange will be:
> T = N * X
> We shouldn't make heavy actions when process discovery message. Best solution
> will be separated thread that will do it, while discovery-msg-worker will
> just pass a message to that thread and send a message immediately to another
> node in topology.
> This affects both TcpDiscoverySpi and ZkDiscoverySpi.
> {noformat}
> [11:59:33,134][INFO][tcp-disco-msg-worker-#2][TcpDiscoverySpi] Enqueued
> message type = TcpDiscoveryCustomEventMessage id =
> e4b542b6561-a38dfe31-dcfd-430b-acb3-5a531db4197e time = 0
> [11:59:33,537][INFO][tcp-disco-msg-worker-#2][GridSnapshotAwareClusterStateProcessorImpl]
> Received activate request with BaselineTopology[id=0]
> [11:59:33,549][INFO][tcp-disco-msg-worker-#2][GridSnapshotAwareClusterStateProcessorImpl]
> Started state transition: true
> [11:59:33,752][INFO][exchange-worker-#62][time] Started exchange init
> [topVer=AffinityTopologyVersion [topVer=110, minorTopVer=1], crd=true,
> evt=DISCOVERY_CUSTOM_EVT, evtNode=a38dfe31-dcfd-430b-acb3-5a531db4197e,
> customEvt=ChangeGlobalStateMessage
> [id=cea542b6561-47395de6-c204-4576-a0a3-99ec53d41ac3,
> reqId=5b651439-7a6a-43fc-9cb0-d646c3380576,
> initiatingNodeId=a38dfe31-dcfd-430b-acb3-5a531db4197e, activate=true,
> baselineTopology=BaselineTopology [id=0, branchingHash=-69412111965,
> branchingType='New BaselineTopology', baselineNodes=[node42, node43, node44,
> node45, node46, node47, node48, node49, node50, node51, node52, node53,
> node54, node55, node56, node57, node58, node59, node1, node4, node5, node2,
> node3, node8, node9, node6, node7, node60, node61, node62, node63, node64,
> node65, node66, node67, node68, node69, node70, node71, node72, node73,
> node74, node75, node76, node77, node78, node79, node80, node81, node82,
> node83, node84, node85, node86, node87, node88, node89, node90, node91,
> node92, node93, node94, node95, node96, node97, node10, node98, node11,
> node99, node12, node13, node14, node15, node16, node100, node17, node18,
> node19, node108, node107, node106, node105, node104, node103, node102,
> node101, node109, node20, node21, node22, node23, node24, node25, node26,
> node27, node28, node29, node110, node30, node31, node32, node33, node34,
> node35, node36, node37, node38, node39, node40, node41]],
> forceChangeBaselineTopology=false, timestamp=1535101173015], allowMerge=false]
> [11:59:33,753][INFO][exchange-worker-#62][GridDhtPartitionsExchangeFuture]
> Start activation process [nodeId=1906b9c3-73f4-4c30-85cc-cf6b99c3bab9,
> client=false, topVer=AffinityTopologyVersion [topVer=110, minorTopVer=1]]
> [11:59:33,756][INFO][exchange-worker-#62][FilePageStoreManager] Resolved page
> store work directory:
> /storage/ssd/avolkov/tiden/snapshots-180824-114937/test_pitr/ignite.server.1/work/db/node1
> [11:59:33,756][INFO][exchange-worker-#62][FileWriteAheadLogManager] Resolved
> write ahead log work directory:
> /storage/ssd/avolkov/tiden/snapshots-180824-114937/test_pitr/ignite.server.1/work/db/wal/node1
> [11:59:33,756][INFO][exchange-worker-#62][FileWriteAheadLogManager] Resolved
> write ahead log archive directory:
> /storage/ssd/avolkov/tiden/snapshots-180824-114937/test_pitr/ignite.server.1/work/db/wal/archive/node1
> [11:59:33,757][INFO][exchange-worker-#62][FileWriteAheadLogManager] Started
> write-ahead log manager [mode=LOG_ONLY]
> [11:59:33,763][INFO][tcp-disco-msg-worker-#2][TcpDiscoverySpi] Processed
> message type = TcpDiscoveryCustomEventMessage id =
> e4b542b6561-a38dfe31-dcfd-430b-acb3-5a531db4197e time = 629
> {noformat}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)