Hi, > we have noticed that Ignite nodes are not responding in the heart beat interval time and hence the re-balancing of data is happening and followed by query cancellation.
It looks like a long GC pauses for me. Can you share full logs from all the nodes? As the fast workaround, I can suggest increasing failureDetectionTimeout, but it's definitely not a final solution for this. Thanks, Evgenii чт, 13 июн. 2019 г. в 14:57, vinod.jv <mail2v...@gmail.com>: > Hi, > > We are using Apache Ignite in embedded mode to store data in key value > pairs > and query the data. > Sometimes, the spark jobs run for really long time than expected and in > those scenarios we have noticed that Ignite nodes are not responding in the > heart beat interval time and hence the re-balancing of data is happening > and > followed by query cancellation. > > When the job is running in the expected time we don't see any exceptions in > the log. > > Here are the exceptions we get. > > org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [10 > seconds]. This timeout is controlled by spark.executor.heartbeatInterval > > Caused by: java.util.concurrent.TimeoutException: Futures timed out after > [10 seconds] > > 19/06/13 02:14:08 ERROR twostep.GridMapQueryExecutor: Failed to execute > local query. > class org.apache.ignite.cache.query.QueryCancelledException: The query was > cancelled while executing. > at > > org.apache.ignite.internal.processors.query.h2.twostep.GridMapQueryExecutor.onQueryRequest0(GridMapQueryExecutor.java:558) > at > > org.apache.ignite.internal.processors.query.h2.twostep.GridMapQueryExecutor.onQueryRequest(GridMapQueryExecutor.java:449) > at > > org.apache.ignite.internal.processors.query.h2.twostep.GridMapQueryExecutor.onMessage(GridMapQueryExecutor.java:203) > at > > org.apache.ignite.internal.processors.query.h2.twostep.GridMapQueryExecutor$2.onMessage(GridMapQueryExecutor.java:178) > at > > org.apache.ignite.internal.managers.communication.GridIoManager$ArrayListener.onMessage(GridIoManager.java:1915) > at > > org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1082) > at > > org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:710) > at > > org.apache.ignite.internal.managers.communication.GridIoManager.access$1700(GridIoManager.java:102) > at > > org.apache.ignite.internal.managers.communication.GridIoManager$5.run(GridIoManager.java:673) > at > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > 19/06/13 02:14:08 ERROR twostep.GridMapQueryExecutor: Failed to execute > local query. > class org.apache.ignite.cache.query.QueryCancelledException: The query was > cancelled while executing. > at > > org.apache.ignite.internal.processors.query.h2.twostep.GridMapQueryExecutor.onQueryRequest0(GridMapQueryExecutor.java:595) > at > > org.apache.ignite.internal.processors.query.h2.twostep.GridMapQueryExecutor.onQueryRequest(GridMapQueryExecutor.java:449) > at > > org.apache.ignite.internal.processors.query.h2.twostep.GridMapQueryExecutor.onMessage(GridMapQueryExecutor.java:203) > at > > org.apache.ignite.internal.processors.query.h2.twostep.GridMapQueryExecutor$2.onMessage(GridMapQueryExecutor.java:178) > at > > org.apache.ignite.internal.managers.communication.GridIoManager$ArrayListener.onMessage(GridIoManager.java:1915) > at > > org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1082) > at > > org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:710) > at > > org.apache.ignite.internal.managers.communication.GridIoManager.access$1700(GridIoManager.java:102) > at > > org.apache.ignite.internal.managers.communication.GridIoManager$5.run(GridIoManager.java:673) > at > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > > > > -- > Sent from: http://apache-ignite-users.70518.x6.nabble.com/ >