[ https://issues.apache.org/jira/browse/IGNITE-3212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Pavel Tupitsyn updated IGNITE-3212: ----------------------------------- Fix Version/s: (was: 3.0) > Servers get stuck with the warning "Failed to wait for initial partition map > exchange" during falover test > ---------------------------------------------------------------------------------------------------------- > > Key: IGNITE-3212 > URL: https://issues.apache.org/jira/browse/IGNITE-3212 > Project: Ignite > Issue Type: Bug > Affects Versions: 1.6 > Reporter: Ksenia Rybakova > Priority: Critical > > Servers being restarted during failover test get stuck after some time with > the warning "Failed to wait for initial partition map exchange". > {noformat} > [08:44:41,303][INFO ][disco-event-worker-#80%null%][GridDiscoveryManager] > Added new node to topology: TcpDiscoveryNode > [id=db557f04-43b7-4e28-ae0d-d4dcf4139c89, addrs= > [10.20.0.222, 127.0.0.1], sockAddrs=[fosters-222/10.20.0.222:47503, > /10.20.0.222:47503, /127.0.0.1:47503], discPort=47503, order=44, intOrder=32, > lastExchangeTime=1464 > 363880917, loc=false, ver=1.6.0#20160525-sha1:48321a40, isClient=false] > [08:44:41,304][INFO ][disco-event-worker-#80%null%][GridDiscoveryManager] > Topology snapshot [ver=44, servers=19, clients=1, CPUs=64, heap=160.0GB] > [08:45:11,455][INFO ][disco-event-worker-#80%null%][GridDiscoveryManager] > Added new node to topology: TcpDiscoveryNode > [id=6fae61a7-c1c1-40e5-8ad0-8bf5d6c86eb7, addrs= > [10.20.0.223, 127.0.0.1], sockAddrs=[fosters-223/10.20.0.223:47503, > /10.20.0.223:47503, /127.0.0.1:47503], discPort=47503, order=45, intOrder=33, > lastExchangeTime=1464 > 363910999, loc=false, ver=1.6.0#20160525-sha1:48321a40, isClient=false] > [08:45:11,455][INFO ][disco-event-worker-#80%null%][GridDiscoveryManager] > Topology snapshot [ver=45, servers=20, clients=1, CPUs=64, heap=170.0GB] > [08:45:19,942][INFO ][ignite-update-notifier-timer][GridUpdateNotifier] > Update status is not available. > [08:46:20,370][WARN ][main][GridCachePartitionExchangeManager] Failed to wait > for initial partition map exchange. Possible reasons are: > ^-- Transactions in deadlock. > ^-- Long running transactions (ignore if this is the case). > ^-- Unreleased explicit locks. > [08:48:30,375][WARN ][main][GridCachePartitionExchangeManager] Still waiting > for initial partition map exchange ... > {noformat} > "Failed to wait for partition release future" warnings are on other nodes. > {noformat} > [08:09:45,822][WARN > ][exchange-worker-#82%null%][GridDhtPartitionsExchangeFuture] Failed to wait > for partition release future [topVer=AffinityTopologyVersion [topVer=29, > minorTopVer=0], node=cab5d0e0-7365-4774-8f99-d9f131c5d896]. Dumping pending > objects that might be the cause: > [08:09:45,822][WARN > ][exchange-worker-#82%null%][GridCachePartitionExchangeManager] Ready > affinity version: AffinityTopologyVersion [topVer=28, minorTopVer=1] > [08:09:45,826][WARN > ][exchange-worker-#82%null%][GridCachePartitionExchangeManager] Last exchange > future: GridDhtPartitionsExchangeFuture ... > {noformat} > Load config: > - 1 client, 20 servers (5 servers per 1 host) > - warmup 60 > - duration 66h > - preload 5M > - key range 10M > - operations: PUT PUT_ALL GET GET_ALL INVOKE INVOKE_ALL REMOVE REMOVE_ALL > PUT_IF_ABSENT REPLACE > - backups count 3 > - 3 servers restart every 15 min with 30 sec step, pause between stop and > start 5min -- This message was sent by Atlassian Jira (v8.20.10#820010)