[ https://issues.apache.org/jira/browse/IGNITE-23877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andrey Khitrin updated IGNITE-23877: ------------------------------------ Affects Version/s: 3.0 > "Replication is timed out" when 1 of 2 nodes is down > ---------------------------------------------------- > > Key: IGNITE-23877 > URL: https://issues.apache.org/jira/browse/IGNITE-23877 > Project: Ignite > Issue Type: Bug > Components: persistence > Affects Versions: 3.0, 3.0.0-beta1 > Environment: 2 nodes (1 node is CMG, each node > {color:#067d17}"-Xms512m"{color}, > {color:#067d17}"-Xmx{color}{color:#067d17}1536{color}{color:#067d17}m"{color}), > each on separate host. Each host vCPU: 4, Memory: 32GB. > Reporter: Igor > Priority: Major > Labels: ignite-3 > Attachments: servers_logsr.zip > > > *Steps to reproduce:* > # Start 2 nodes (1 node is CMG, each node {color:#067d17}"-Xms512m"{color}, > {color:#067d17}"-Xmx{color}1536{color:#067d17}m"{color}), each on separate > host. Each host vCPU: 4, Memory: 32GB. > # Setup connection to both nodes: > {code:java} > IgniteClient.builder().retryPolicy(new > RetryLimitPolicy()).addresses(thinClientEndpoints.toArray(new > String[0])).build() > {code} > # Create distribution zone > # Create table > # Insert row(s) > # Select all before kill the node > # Await all partitions of all tables local state is "HEALTHY" > # Await all partitions of all tables global state is "AVAILABLE" > # Kill the second (non-CMG) node > # Select all after kill the node > *Expected:* > Correct data is returned. > *Actual:* > Exception returned on step 8: > {code:java} > org.opentest4j.AssertionFailedError: org.opentest4j.AssertionFailedError: > Select after node is killed ==> Unexpected exception thrown: > org.apache.ignite.sql.SqlException: Replication is timed out > [replicaGrpId=17_part_10] > org.opentest4j.AssertionFailedError: Select after node is killed ==> > Unexpected exception thrown: org.apache.ignite.sql.SqlException: Replication > is timed out [replicaGrpId=17_part_10] > at > app//org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:152) > at > app//org.junit.jupiter.api.AssertDoesNotThrow.createAssertionFailedError(AssertDoesNotThrow.java:84) > at > app//org.junit.jupiter.api.AssertDoesNotThrow.assertDoesNotThrow(AssertDoesNotThrow.java:53) > at > app//org.junit.jupiter.api.AssertDoesNotThrow.assertDoesNotThrow(AssertDoesNotThrow.java:40) > at > app//org.junit.jupiter.api.Assertions.assertDoesNotThrow(Assertions.java:3183) > at > app//org.gridgain.ai3tests.tests.ConnectionAfterNodeIsKilledTest.testThinClientConnectionToMultipleHostAfter1NodeIsKilled(ConnectionAfterNodeIsKilledTest.java:136) > at java.base@21.0.2/java.lang.reflect.Method.invoke(Method.java:580) > at java.base@21.0.2/java.util.concurrent.FutureTask.run(FutureTask.java:317) > at > java.base@21.0.2/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) > at > java.base@21.0.2/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) > at java.base@21.0.2/java.lang.Thread.run(Thread.java:1583) > Caused by: org.apache.ignite.sql.SqlException: IGN-REP-3 > TraceId:4fffa9f4-9fcb-43b0-bc7d-8fb37c6dfaa0 Replication is timed out > [replicaGrpId=17_part_10] > at > java.base@21.0.2/java.lang.invoke.MethodHandle.invokeWithArguments(MethodHandle.java:733) > at > app//org.apache.ignite.internal.util.ExceptionUtils$1.copy(ExceptionUtils.java:789) > at > app//org.apache.ignite.internal.util.ExceptionUtils$ExceptionFactory.createCopy(ExceptionUtils.java:723) > at > app//org.apache.ignite.internal.util.ExceptionUtils.copyExceptionWithCause(ExceptionUtils.java:525) > at > app//org.apache.ignite.internal.util.ExceptionUtils.copyExceptionWithCauseInternal(ExceptionUtils.java:658) > at > app//org.apache.ignite.internal.util.ExceptionUtils.copyExceptionWithCause(ExceptionUtils.java:476) > at > app//org.apache.ignite.internal.client.sql.ClientSql.execute(ClientSql.java:106) > at app//org.apache.ignite.sql.IgniteSql.execute(IgniteSql.java:57) > at > app//org.gridgain.ai3tests.tests.teststeps.ThinClientSteps.lambda$executeQuery$0(ThinClientSteps.java:64) > at app//io.qameta.allure.Allure.lambda$step$1(Allure.java:127) > at app//io.qameta.allure.Allure.step(Allure.java:181) > at app//io.qameta.allure.Allure.step(Allure.java:125) > at > app//org.gridgain.ai3tests.tests.teststeps.ThinClientSteps.executeQuery(ThinClientSteps.java:64) > at app//org.gridgain.ai3tests.tests.TestUtils.selectAll(TestUtils.java:174) > at > app//org.gridgain.ai3tests.tests.ConnectionAfterNodeIsKilledTest.lambda$testThinClientConnectionToMultipleHostAfter1NodeIsKilled$0(ConnectionAfterNodeIsKilledTest.java:137) > at > app//org.junit.jupiter.api.AssertDoesNotThrow.assertDoesNotThrow(AssertDoesNotThrow.java:49) > ... 8 more > Caused by: java.util.concurrent.CompletionException: > org.apache.ignite.sql.SqlException: IGN-REP-3 > TraceId:4fffa9f4-9fcb-43b0-bc7d-8fb37c6dfaa0 Replication is timed out > [replicaGrpId=17_part_10] > at > java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:315) > at > java.base/java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:320) > at > java.base/java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:936) > at > java.base/java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:911) > at > java.base/java.util.concurrent.CompletableFuture$Completion.exec(CompletableFuture.java:483) > at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:387) > at > java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1312) > at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1843) > at > java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1808) > at > java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:188) > Caused by: org.apache.ignite.sql.SqlException: IGN-REP-3 > TraceId:4fffa9f4-9fcb-43b0-bc7d-8fb37c6dfaa0 Replication is timed out > [replicaGrpId=17_part_10] > at > java.base@21.0.2/java.lang.invoke.MethodHandle.invokeWithArguments(MethodHandle.java:733) > at > app//org.apache.ignite.internal.util.ExceptionUtils$1.copy(ExceptionUtils.java:789) > at > app//org.apache.ignite.internal.util.ExceptionUtils$ExceptionFactory.createCopy(ExceptionUtils.java:723) > at > app//org.apache.ignite.internal.util.ExceptionUtils.copyExceptionWithCause(ExceptionUtils.java:525) > at > app//org.apache.ignite.internal.util.ViewUtils.copyExceptionWithCauseIfPossible(ViewUtils.java:91) > at > app//org.apache.ignite.internal.util.ViewUtils.ensurePublicException(ViewUtils.java:71) > at > app//org.apache.ignite.internal.client.TcpClientChannel.lambda$send$4(TcpClientChannel.java:388) > at > java.base@21.0.2/java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:934) > ... 7 more > Caused by: org.apache.ignite.sql.SqlException: IGN-REP-3 > TraceId:4fffa9f4-9fcb-43b0-bc7d-8fb37c6dfaa0 Replication is timed out > [replicaGrpId=17_part_10] > at > java.base@21.0.2/java.lang.invoke.MethodHandle.invokeWithArguments(MethodHandle.java:733) > at > app//org.apache.ignite.internal.util.ExceptionUtils$1.copy(ExceptionUtils.java:789) > at > app//org.apache.ignite.internal.util.ExceptionUtils$ExceptionFactory.createCopy(ExceptionUtils.java:723) > at > app//org.apache.ignite.internal.util.ExceptionUtils.copyExceptionWithCause(ExceptionUtils.java:525) > at > app//org.apache.ignite.internal.client.TcpClientChannel.readError(TcpClientChannel.java:554) > at > app//org.apache.ignite.internal.client.TcpClientChannel.processNextMessage(TcpClientChannel.java:448) > at > app//org.apache.ignite.internal.client.TcpClientChannel.onMessage(TcpClientChannel.java:271) > at > app//org.apache.ignite.internal.client.io.netty.NettyClientConnection.onMessage(NettyClientConnection.java:117) > at > app//org.apache.ignite.internal.client.io.netty.NettyClientMessageHandler.channelRead(NettyClientMessageHandler.java:33) > at > app//io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) > at > app//io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) > at > app//io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) > at > app//io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:346) > at > app//io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:318) > at > app//io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) > at > app//io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) > at > app//io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) > at > app//io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1357) > at > app//io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440) > at > app//io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) > at > app//io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:868) > at > app//io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166) > at > app//io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788) > at > app//io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:724) > at > app//io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:650) > at app//io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562) > at > app//io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997) > at > app//io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) > at > app//io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.base@21.0.2/java.lang.Thread.run(Thread.java:1583) > Caused by: org.apache.ignite.lang.IgniteException: IGN-REP-3 > TraceId:4fffa9f4-9fcb-43b0-bc7d-8fb37c6dfaa0 To see the full stack trace set > clientConnector.sendServerExceptionStackTraceToClient:true > at > app//org.apache.ignite.internal.client.TcpClientChannel.readError(TcpClientChannel.java:519) > ... 25 more {code} > [^servers_logsr.zip] -- This message was sent by Atlassian Jira (v8.20.10#820010)