[
https://issues.apache.org/jira/browse/IGNITE-5707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ksenia Rybakova updated IGNITE-5707:
------------------------------------
Description:
Load test config:
- CacheRandomOperationBenchmark
- 8 clients, 48 servers at 8 hosts
- 26 physical caches of different types with different memory policies + 30
groups with 10 partitioned caches each + 20 groups with 10 replicated caches
each. Total 526 caches.
- Preloading amount: 50K, key range: 60K
Complete configs are attached.
3 of 8 clients have following messages during preloading:
{noformat}
[12:17:56] (err) Failed to execute compound future reducer: GridCompoundFuture
[rdc=null, initFlag=1, lsnrCalls=0, done=false, cancelled=false, err=null,
futs=[true, false, false]][12:17:56] (err) Failed to
execute compound future reducer: GridCompoundFuture [rdc=null, initFlag=1,
lsnrCalls=0, done=false, cancelled=false, err=null, futs=[true, true, false,
false, false, false, false, false, false, false, false,
false, false, false, false, false, false, false, false, false, false, false,
false, false, false, false, false, false, false, false, false, false, false,
false, false, false, false]][12:17:56] (err) Failed
to execute compound future reducer: GridCompoundFuture [rdc=null, initFlag=1,
lsnrCalls=0, done=false, cancelled=false, err=null, futs=[true, true, false,
false, false, false, false, false, false, false, fal
se, false, false, false, false, false, false, false, false, false, false,
false, false, false, false, false, false, false, false, false, false, false,
false, false, false, false, false]]class org.apache.igni
te.IgniteCheckedException: DataStreamer request failed
[node=16a20d0c-4009-4bfa-ad6e-0261d9e3b2a3]
at
org.apache.ignite.internal.processors.datastreamer.DataStreamerImpl$Buffer.onResponse(DataStreamerImpl.java:1785)
at
org.apache.ignite.internal.processors.datastreamer.DataStreamerImpl$3.onMessage(DataStreamerImpl.java:333)
at
org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1556)
at
org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1184)
at
org.apache.ignite.internal.managers.communication.GridIoManager.access$4200(GridIoManager.java:126)
at
org.apache.ignite.internal.managers.communication.GridIoManager$9.run(GridIoManager.java:1097)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: class org.apache.ignite.IgniteCheckedException: DataStreamer will
retry data transfer at stable topology [reqTop=AffinityTopologyVersion
[topVer=56, minorTopVer=0], topVer=AffinityTopologyVersion
[topVer=56, minorTopVer=1], node=remote]
at
org.apache.ignite.internal.processors.datastreamer.DataStreamProcessor.localUpdate(DataStreamProcessor.java:343)
at
org.apache.ignite.internal.processors.datastreamer.DataStreamProcessor.processRequest(DataStreamProcessor.java:301)
at
org.apache.ignite.internal.processors.datastreamer.DataStreamProcessor.access$000(DataStreamProcessor.java:58)
at
org.apache.ignite.internal.processors.datastreamer.DataStreamProcessor$1.onMessage(DataStreamProcessor.java:88)
... 7 more
{noformat}
2 drivers were able to resume streaming after some time, but 1 didn't (error
messages continued to be printed). This driver had high heap utilization, that
resulted in long GC pause. Finally it was considered failed by other nodes.
Also it was observed that all clients have in log:
{noformat}
[2017-07-06 12:18:11,780][INFO ][exchange-worker-#41%null%][time] Started
exchange init [topVer=AffinityTopologyVersion [topVer=56, minorTopVer=1],
crd=false, evt=18, node=TcpDiscoveryNode
[id=71174cde-6bcf-43b6-a97b-02e0c987a8da, addrs=[127.0.0.1, 172.25.1.31],
sockAddrs=[testagent01.gridgain.local/172.25.1.31:0, /127.0.0.1:0], discPort=0,
order=49, intOrder=0, lastExchangeTime=1499332605385, loc=true,
ver=2.1.0#20170705-sha1:ad42f620, isClient=true], evtNode=TcpDiscoveryNode
[id=71174cde-6bcf-43b6-a97b-02e0c987a8da, addrs=[127.0.0.1, 172.25.1.31],
sockAddrs=[testagent01.gridgain.local/172.25.1.31:0, /127.0.0.1:0], discPort=0,
order=49, intOrder=0, lastExchangeTime=1499332605385, loc=true,
ver=2.1.0#20170705-sha1:ad42f620, isClient=true],
customEvt=CacheAffinityChangeMessage
[id=b5a8f271d51-f9bb8d96-c609-4de4-b32f-761c2a33ad10,
topVer=AffinityTopologyVersion [topVer=48, minorTopVer=0], exchId=null,
partsMsg=null, exchangeNeeded=true]]
[2017-07-06 12:18:12,284][WARN ][pool-5-thread-2][GridDhtPartitionTopologyImpl]
Requested topology version does not match calculated diff, will require full
iteration tocalculate mapping [topVer=AffinityTopologyVersion [topVer=56,
minorTopVer=0], diffVer=AffinityTopologyVersion [topVer=56, minorTopVer=1]]
[2017-07-06 12:18:12,642][WARN ][sys-#56%null%][GridDhtPartitionTopologyImpl]
Requested topology version does not match calculated diff, will require full
iteration tocalculate mapping [topVer=AffinityTopologyVersion [topVer=56,
minorTopVer=1], diffVer=AffinityTopologyVersion [topVer=56, minorTopVer=0]]
[2017-07-06 12:18:12,919][INFO ][exchange-worker-#41%null%][time] Finished
exchange init [topVer=AffinityTopologyVersion [topVer=56, minorTopVer=1],
crd=false]
{noformat}
was:
Load test config:
- CacheRandomOperationBenchmark
- 8 clients, 48 servers at 8 hosts
- 26 physical caches of different types with different memory policies + 30
groups with 10 partitioned caches each + 20 groups with 10 replicated caches
each. Total 526 caches.
- Preloading amount: 50K, key range: 60K
Complete configs are attached.
3 of 8 clients have following messages during preloading:
{noformat}
[12:17:56] (err) Failed to execute compound future reducer: GridCompoundFuture
[rdc=null, initFlag=1, lsnrCalls=0, done=false, cancelled=false, err=null,
futs=[true, false, false]][12:17:56] (err) Failed to
execute compound future reducer: GridCompoundFuture [rdc=null, initFlag=1,
lsnrCalls=0, done=false, cancelled=false, err=null, futs=[true, true, false,
false, false, false, false, false, false, false, false,
false, false, false, false, false, false, false, false, false, false, false,
false, false, false, false, false, false, false, false, false, false, false,
false, false, false, false]][12:17:56] (err) Failed
to execute compound future reducer: GridCompoundFuture [rdc=null, initFlag=1,
lsnrCalls=0, done=false, cancelled=false, err=null, futs=[true, true, false,
false, false, false, false, false, false, false, fal
se, false, false, false, false, false, false, false, false, false, false,
false, false, false, false, false, false, false, false, false, false, false,
false, false, false, false, false]]class org.apache.igni
te.IgniteCheckedException: DataStreamer request failed
[node=16a20d0c-4009-4bfa-ad6e-0261d9e3b2a3]
at
org.apache.ignite.internal.processors.datastreamer.DataStreamerImpl$Buffer.onResponse(DataStreamerImpl.java:1785)
at
org.apache.ignite.internal.processors.datastreamer.DataStreamerImpl$3.onMessage(DataStreamerImpl.java:333)
at
org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1556)
at
org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1184)
at
org.apache.ignite.internal.managers.communication.GridIoManager.access$4200(GridIoManager.java:126)
at
org.apache.ignite.internal.managers.communication.GridIoManager$9.run(GridIoManager.java:1097)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: class org.apache.ignite.IgniteCheckedException: DataStreamer will
retry data transfer at stable topology [reqTop=AffinityTopologyVersion
[topVer=56, minorTopVer=0], topVer=AffinityTopologyVersion
[topVer=56, minorTopVer=1], node=remote]
at
org.apache.ignite.internal.processors.datastreamer.DataStreamProcessor.localUpdate(DataStreamProcessor.java:343)
at
org.apache.ignite.internal.processors.datastreamer.DataStreamProcessor.processRequest(DataStreamProcessor.java:301)
at
org.apache.ignite.internal.processors.datastreamer.DataStreamProcessor.access$000(DataStreamProcessor.java:58)
at
org.apache.ignite.internal.processors.datastreamer.DataStreamProcessor$1.onMessage(DataStreamProcessor.java:88)
... 7 more
{noformat}
2 drivers were able to resume streaming after some time, but 1 didn't (error
messages continued to be printed). This driver had high heap utilization, that
resulted in long GC pause. Finally it was considered failed by other nodes.
> Client can't resume streaming even after topology got stable during load test
> -----------------------------------------------------------------------------
>
> Key: IGNITE-5707
> URL: https://issues.apache.org/jira/browse/IGNITE-5707
> Project: Ignite
> Issue Type: Bug
> Affects Versions: 2.1
> Reporter: Ksenia Rybakova
>
> Load test config:
> - CacheRandomOperationBenchmark
> - 8 clients, 48 servers at 8 hosts
> - 26 physical caches of different types with different memory policies + 30
> groups with 10 partitioned caches each + 20 groups with 10 replicated caches
> each. Total 526 caches.
> - Preloading amount: 50K, key range: 60K
> Complete configs are attached.
> 3 of 8 clients have following messages during preloading:
> {noformat}
> [12:17:56] (err) Failed to execute compound future reducer:
> GridCompoundFuture [rdc=null, initFlag=1, lsnrCalls=0, done=false,
> cancelled=false, err=null, futs=[true, false, false]][12:17:56] (err) Failed
> to
> execute compound future reducer: GridCompoundFuture [rdc=null, initFlag=1,
> lsnrCalls=0, done=false, cancelled=false, err=null, futs=[true, true, false,
> false, false, false, false, false, false, false, false,
> false, false, false, false, false, false, false, false, false, false, false,
> false, false, false, false, false, false, false, false, false, false, false,
> false, false, false, false]][12:17:56] (err) Failed
> to execute compound future reducer: GridCompoundFuture [rdc=null, initFlag=1,
> lsnrCalls=0, done=false, cancelled=false, err=null, futs=[true, true, false,
> false, false, false, false, false, false, false, fal
> se, false, false, false, false, false, false, false, false, false, false,
> false, false, false, false, false, false, false, false, false, false, false,
> false, false, false, false, false]]class org.apache.igni
> te.IgniteCheckedException: DataStreamer request failed
> [node=16a20d0c-4009-4bfa-ad6e-0261d9e3b2a3]
> at
> org.apache.ignite.internal.processors.datastreamer.DataStreamerImpl$Buffer.onResponse(DataStreamerImpl.java:1785)
> at
> org.apache.ignite.internal.processors.datastreamer.DataStreamerImpl$3.onMessage(DataStreamerImpl.java:333)
> at
> org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1556)
> at
> org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1184)
> at
> org.apache.ignite.internal.managers.communication.GridIoManager.access$4200(GridIoManager.java:126)
> at
> org.apache.ignite.internal.managers.communication.GridIoManager$9.run(GridIoManager.java:1097)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: class org.apache.ignite.IgniteCheckedException: DataStreamer will
> retry data transfer at stable topology [reqTop=AffinityTopologyVersion
> [topVer=56, minorTopVer=0], topVer=AffinityTopologyVersion
> [topVer=56, minorTopVer=1], node=remote]
> at
> org.apache.ignite.internal.processors.datastreamer.DataStreamProcessor.localUpdate(DataStreamProcessor.java:343)
> at
> org.apache.ignite.internal.processors.datastreamer.DataStreamProcessor.processRequest(DataStreamProcessor.java:301)
> at
> org.apache.ignite.internal.processors.datastreamer.DataStreamProcessor.access$000(DataStreamProcessor.java:58)
> at
> org.apache.ignite.internal.processors.datastreamer.DataStreamProcessor$1.onMessage(DataStreamProcessor.java:88)
> ... 7 more
> {noformat}
> 2 drivers were able to resume streaming after some time, but 1 didn't (error
> messages continued to be printed). This driver had high heap utilization,
> that resulted in long GC pause. Finally it was considered failed by other
> nodes.
> Also it was observed that all clients have in log:
> {noformat}
> [2017-07-06 12:18:11,780][INFO ][exchange-worker-#41%null%][time] Started
> exchange init [topVer=AffinityTopologyVersion [topVer=56, minorTopVer=1],
> crd=false, evt=18, node=TcpDiscoveryNode
> [id=71174cde-6bcf-43b6-a97b-02e0c987a8da, addrs=[127.0.0.1, 172.25.1.31],
> sockAddrs=[testagent01.gridgain.local/172.25.1.31:0, /127.0.0.1:0],
> discPort=0, order=49, intOrder=0, lastExchangeTime=1499332605385, loc=true,
> ver=2.1.0#20170705-sha1:ad42f620, isClient=true], evtNode=TcpDiscoveryNode
> [id=71174cde-6bcf-43b6-a97b-02e0c987a8da, addrs=[127.0.0.1, 172.25.1.31],
> sockAddrs=[testagent01.gridgain.local/172.25.1.31:0, /127.0.0.1:0],
> discPort=0, order=49, intOrder=0, lastExchangeTime=1499332605385, loc=true,
> ver=2.1.0#20170705-sha1:ad42f620, isClient=true],
> customEvt=CacheAffinityChangeMessage
> [id=b5a8f271d51-f9bb8d96-c609-4de4-b32f-761c2a33ad10,
> topVer=AffinityTopologyVersion [topVer=48, minorTopVer=0], exchId=null,
> partsMsg=null, exchangeNeeded=true]]
> [2017-07-06 12:18:12,284][WARN
> ][pool-5-thread-2][GridDhtPartitionTopologyImpl] Requested topology version
> does not match calculated diff, will require full iteration tocalculate
> mapping [topVer=AffinityTopologyVersion [topVer=56, minorTopVer=0],
> diffVer=AffinityTopologyVersion [topVer=56, minorTopVer=1]]
> [2017-07-06 12:18:12,642][WARN ][sys-#56%null%][GridDhtPartitionTopologyImpl]
> Requested topology version does not match calculated diff, will require full
> iteration tocalculate mapping [topVer=AffinityTopologyVersion [topVer=56,
> minorTopVer=1], diffVer=AffinityTopologyVersion [topVer=56, minorTopVer=0]]
> [2017-07-06 12:18:12,919][INFO ][exchange-worker-#41%null%][time] Finished
> exchange init [topVer=AffinityTopologyVersion [topVer=56, minorTopVer=1],
> crd=false]
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)