Some of those tests are trying to stress conditions that require a lot of resources to replicate specific conditions. Have you tried to run those individual tests in isolation so that you are not competing for resources? Do they always fail, or are the failures transient?
-----Original Message----- From: Mark Jens <mark.r.j...@gmail.com> Sent: Tuesday, November 30, 2021 4:05 AM To: dev@accumulo.apache.org Subject: Consistent IT tests failures on Linux ARM64 Hello Accumulo community, At my job we consider using Linux ARM64 servers and I've been tasked to test Accumulo. I face some timeout related issues with several IT tests: [ERROR] org.apache.accumulo.test.functional.ConcurrentDeleteTableIT.testConcurrentFateOpsWithDelete Time elapsed: 420.122 s <<< ERROR! org.junit.runners.model.TestTimedOutException: test timed out after 420 seconds at java.base@11.0.11/jdk.internal.misc.Unsafe.park(Native Method) at java.base@11.0.11 /java.util.concurrent.locks.LockSupport.park(LockSupport.java:194) at java.base@11.0.11 /java.util.concurrent.FutureTask.awaitDone(FutureTask.java:447) at java.base@11.0.11 /java.util.concurrent.FutureTask.get(FutureTask.java:190) at app//org.apache.accumulo.test.functional.ConcurrentDeleteTableIT.testConcurrentFateOpsWithDelete(ConcurrentDeleteTableIT.java:213) at java.base@11.0.11/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base@11.0.11 /jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base@11.0.11 /jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base@11.0.11/java.lang.reflect.Method.invoke(Method.java:566) at app//org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) at app//org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at app//org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) at app//org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at app//org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at app//org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at app//org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299) at app//org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293) at java.base@11.0.11 /java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base@11.0.11/java.lang.Thread.run(Thread.java:829) [ERROR] org.apache.accumulo.test.functional.ConcurrentDeleteTableIT.testConcurrentFateOpsWithDelete Time elapsed: 420.122 s <<< ERROR! java.lang.Exception: Appears to be stuck in thread Time-limited test-SendThread(localhost:44251) at java.base@11.0.11/sun.nio.ch.EPoll.wait(Native Method) at java.base@11.0.11 /sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:120) at java.base@11.0.11 /sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:124) at java.base@11.0.11/sun.nio.ch.SelectorImpl.select(SelectorImpl.java:136) at app//org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:347) at app//org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1223) [ERROR] org.apache.accumulo.test.functional.ConcurrentDeleteTableIT.testConcurrentDeleteTablesOps Time elapsed: 420.011 s <<< ERROR! org.junit.runners.model.TestTimedOutException: test timed out after 420 seconds at java.base@11.0.11/java.lang.Thread.sleep(Native Method) at app//org.apache.accumulo.fate.zookeeper.ZooCache$ZooRunnable.retry(ZooCache.java:299) at app//org.apache.accumulo.fate.zookeeper.ZooCache.get(ZooCache.java:442) at app//org.apache.accumulo.fate.zookeeper.ZooCache.get(ZooCache.java:372) at app//org.apache.accumulo.core.clientImpl.ClientContext.verifyInstanceId(ClientContext.java:467) at app//org.apache.accumulo.core.clientImpl.ClientContext.getInstanceID(ClientContext.java:446) at app//org.apache.accumulo.core.clientImpl.ClientContext.getManagerLocations(ClientContext.java:405) at app//org.apache.accumulo.core.clientImpl.ManagerClient.getConnection(ManagerClient.java:59) at app//org.apache.accumulo.core.clientImpl.ManagerClient.getConnectionWithRetry(ManagerClient.java:49) at app//org.apache.accumulo.core.clientImpl.TableOperationsImpl.beginFateOperation(TableOperationsImpl.java:260) at app//org.apache.accumulo.core.clientImpl.TableOperationsImpl.doFateOperation(TableOperationsImpl.java:369) at app//org.apache.accumulo.core.clientImpl.TableOperationsImpl.doFateOperation(TableOperationsImpl.java:359) at app//org.apache.accumulo.core.clientImpl.TableOperationsImpl.doTableFateOperation(TableOperationsImpl.java:1670) at app//org.apache.accumulo.core.clientImpl.TableOperationsImpl.create(TableOperationsImpl.java:248) at app//org.apache.accumulo.test.functional.ConcurrentDeleteTableIT.testConcurrentDeleteTablesOps(ConcurrentDeleteTableIT.java:76) at java.base@11.0.11/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base@11.0.11 /jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base@11.0.11 /jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base@11.0.11/java.lang.reflect.Method.invoke(Method.java:566) at app//org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) at app//org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at app//org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) at app//org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at app//org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at app//org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at app//org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299) at app//org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293) at java.base@11.0.11 /java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base@11.0.11/java.lang.Thread.run(Thread.java:829) [INFO] Running org.apache.accumulo.test.functional.ScannerContextIT [INFO] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 102.909 s - in org.apache.accumulo.test.functional.ScannerContextIT [INFO] Running org.apache.accumulo.test.functional.KerberosRenewalIT [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 504.472 s - in org.apache.accumulo.test.functional.KerberosRenewalIT [INFO] Running org.apache.accumulo.test.functional.BatchWriterFlushIT [INFO] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 62.132 s - in org.apache.accumulo.test.functional.BatchWriterFlushIT [INFO] Running org.apache.accumulo.test.functional.BinaryIT [INFO] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 65.034 s - in org.apache.accumulo.test.functional.BinaryIT [INFO] Running org.apache.accumulo.test.functional.PermissionsIT [INFO] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 59.25 s - in org.apache.accumulo.test.functional.PermissionsIT [INFO] Running org.apache.accumulo.test.functional.ZookeeperRestartIT [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 37.37 s - in org.apache.accumulo.test.functional.ZookeeperRestartIT [INFO] Running org.apache.accumulo.test.functional.CreateManyScannersIT [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 23.046 s - in org.apache.accumulo.test.functional.CreateManyScannersIT [INFO] Running org.apache.accumulo.test.functional.CreateInitialSplitsIT [INFO] Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 255.108 s - in org.apache.accumulo.test.functional.CreateInitialSplitsIT [INFO] Running org.apache.accumulo.test.functional.MonitorSslIT [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 25.304 s - in org.apache.accumulo.test.functional.MonitorSslIT [INFO] Running org.apache.accumulo.test.functional.RestartStressIT [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 78.359 s - in org.apache.accumulo.test.functional.RestartStressIT [INFO] Running org.apache.accumulo.test.functional.BulkSplitOptimizationIT [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 59.289 s - in org.apache.accumulo.test.functional.BulkSplitOptimizationIT [INFO] Running org.apache.accumulo.test.functional.BulkNewIT [INFO] Tests run: 14, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 63.696 s - in org.apache.accumulo.test.functional.BulkNewIT [INFO] Running org.apache.accumulo.test.functional.BloomFilterIT [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 135.298 s - in org.apache.accumulo.test.functional.BloomFilterIT [INFO] Running org.apache.accumulo.test.functional.BulkIT [INFO] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 122.959 s - in org.apache.accumulo.test.functional.BulkIT [INFO] Running org.apache.accumulo.test.functional.BinaryStressIT [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 38.626 s - in org.apache.accumulo.test.functional.BinaryStressIT [INFO] Running org.apache.accumulo.test.functional.ClassLoaderIT [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 45.61 s - in org.apache.accumulo.test.functional.ClassLoaderIT [INFO] Running org.apache.accumulo.test.functional.LogicalTimeIT [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 116.819 s - in org.apache.accumulo.test.functional.LogicalTimeIT [INFO] Running org.apache.accumulo.test.functional.SplitRecoveryIT [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 25.421 s - in org.apache.accumulo.test.functional.SplitRecoveryIT [INFO] Running org.apache.accumulo.test.functional.BigRootTabletIT [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 96.86 s - in org.apache.accumulo.test.functional.BigRootTabletIT [INFO] Running org.apache.accumulo.test.functional.GarbageCollectorIT [INFO] Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 238.409 s - in org.apache.accumulo.test.functional.GarbageCollectorIT [INFO] Running org.apache.accumulo.test.functional.BalanceInPresenceOfOfflineTableIT [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 219.253 s - in org.apache.accumulo.test.functional.BalanceInPresenceOfOfflineTableIT [INFO] Running org.apache.accumulo.test.functional.VisibilityIT [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 38.015 s - in org.apache.accumulo.test.functional.VisibilityIT [INFO] Running org.apache.accumulo.test.functional.SslWithClientAuthIT [INFO] Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 489.863 s - in org.apache.accumulo.test.functional.SslWithClientAuthIT [INFO] Running org.apache.accumulo.test.functional.SummaryIT [INFO] Tests run: 10, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 111.552 s - in org.apache.accumulo.test.functional.SummaryIT [INFO] Running org.apache.accumulo.test.functional.MaxOpenIT [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 30.061 s - in org.apache.accumulo.test.functional.MaxOpenIT [INFO] Running org.apache.accumulo.test.functional.ManagerFailoverIT [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 47.089 s - in org.apache.accumulo.test.functional.ManagerFailoverIT [INFO] Running org.apache.accumulo.test.functional.DeleteRowsIT [INFO] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 229.586 s - in org.apache.accumulo.test.functional.DeleteRowsIT [INFO] Running org.apache.accumulo.test.functional.BackupManagerIT [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 22.943 s - in org.apache.accumulo.test.functional.BackupManagerIT [INFO] Running org.apache.accumulo.test.functional.TabletMetadataIT [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 46.728 s - in org.apache.accumulo.test.functional.TabletMetadataIT [INFO] Running org.apache.accumulo.test.functional.LateLastContactIT [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 46.648 s - in org.apache.accumulo.test.functional.LateLastContactIT [INFO] Running org.apache.accumulo.test.functional.SimpleBalancerFairnessIT [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 71.934 s - in org.apache.accumulo.test.functional.SimpleBalancerFairnessIT [INFO] Running org.apache.accumulo.test.functional.HalfDeadTServerIT [ERROR] Tests run: 3, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 307.904 s <<< FAILURE! - in org.apache.accumulo.test.functional.HalfDeadTServerIT [ERROR] org.apache.accumulo.test.functional.HalfDeadTServerIT.testRecover Time elapsed: 240.011 s <<< ERROR! org.junit.runners.model.TestTimedOutException: test timed out after 240 seconds at java.base@11.0.11/java.lang.Object.wait(Native Method) at java.base@11.0.11/java.lang.Object.wait(Object.java:328) at java.base@11.0.11/java.lang.ProcessImpl.waitFor(ProcessImpl.java:495) at app//org.apache.accumulo.test.functional.HalfDeadTServerIT.test(HalfDeadTServerIT.java:217) at app//org.apache.accumulo.test.functional.HalfDeadTServerIT.testRecover(HalfDeadTServerIT.java:142) at java.base@11.0.11/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base@11.0.11 /jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base@11.0.11 /jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base@11.0.11/java.lang.reflect.Method.invoke(Method.java:566) at app//org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) at app//org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at app//org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) at app//org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at app//org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at app//org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at app//org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299) at app//org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293) at java.base@11.0.11 /java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base@11.0.11/java.lang.Thread.run(Thread.java:829) [ERROR] org.apache.accumulo.test.functional.HalfDeadTServerIT.testRecover Time elapsed: 240.012 s <<< ERROR! java.lang.Exception: Appears to be stuck in thread Time-limited test-SendThread(localhost:39285) at java.base@11.0.11/sun.nio.ch.EPoll.wait(Native Method) at java.base@11.0.11 /sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:120) at java.base@11.0.11 /sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:124) at java.base@11.0.11/sun.nio.ch.SelectorImpl.select(SelectorImpl.java:136) at app//org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:347) at app//org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1223) [INFO] Running org.apache.accumulo.test.functional.MetadataIT [INFO] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 97.987 s - in org.apache.accumulo.test.functional.MetadataIT [INFO] Running org.apache.accumulo.test.functional.ScanSessionTimeOutIT [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 43.91 s - in org.apache.accumulo.test.functional.ScanSessionTimeOutIT [INFO] Running org.apache.accumulo.test.functional.ZooCacheIT [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 33.986 s - in org.apache.accumulo.test.functional.ZooCacheIT [INFO] Running org.apache.accumulo.test.functional.DeleteRowsSplitIT [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 113.928 s - in org.apache.accumulo.test.functional.DeleteRowsSplitIT [INFO] Running org.apache.accumulo.test.ScanFlushWithTimeIT [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 36.854 s - in org.apache.accumulo.test.ScanFlushWithTimeIT [INFO] Running org.apache.accumulo.test.AuditMessageIT [INFO] Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 165.169 s - in org.apache.accumulo.test.AuditMessageIT [INFO] Running org.apache.accumulo.test.gc.replication.CloseWriteAheadLogReferencesIT [WARNING] Tests run: 1, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 0.039 s - in org.apache.accumulo.test.gc.replication.CloseWriteAheadLogReferencesIT [INFO] [INFO] Results: [INFO] [ERROR] Errors: [ERROR] org.apache.accumulo.test.compaction.ExternalCompaction_3_IT.testCoordinatorRestartsDuringCompaction [ERROR] Run 1: ExternalCompaction_3_IT.testCoordinatorRestartsDuringCompaction:178 » TestTimedOut [ERROR] Run 2: ExternalCompaction_3_IT.testCoordinatorRestartsDuringCompaction » Appears to ... [INFO] [ERROR] ConcurrentDeleteTableIT.testConcurrentDeleteTablesOps:76 » TestTimedOut test t... [ERROR] org.apache.accumulo.test.functional.ConcurrentDeleteTableIT.testConcurrentFateOpsWithDelete [ERROR] Run 1: ConcurrentDeleteTableIT.testConcurrentFateOpsWithDelete:213 » TestTimedOut tes... [ERROR] Run 2: ConcurrentDeleteTableIT.testConcurrentFateOpsWithDelete » Appears to be stuck... [INFO] [ERROR] org.apache.accumulo.test.functional.HalfDeadTServerIT.testRecover [ERROR] Run 1: HalfDeadTServerIT.testRecover:142->test:217->Object.wait:328->Object.wait:-2 » TestTimedOut [ERROR] Run 2: HalfDeadTServerIT.testRecover » Appears to be stuck in thread Time-limited te... [INFO] [ERROR] org.apache.accumulo.test.functional.SslIT.adminStop [ERROR] Run 1: SslIT.adminStop:68->Object.wait:328->Object.wait:-2 » TestTimedOut test timed ... [ERROR] Run 2: SslIT.adminStop » Appears to be stuck in thread Time-limited test-SendThread(... These tests fail consistently at every build attempt! The tests fail even when executed separately, e.g.: mvn verify -Dit.test=ConcurrentDeleteTableIT -o -rf :accumulo-test I am using the current 'main' branch of Accumulo. JDK 11.0.11 Maven: 3.8.2 OS: Ubuntu 20.04.3 ARM64 Is there anything that could be done to fix these problems ? For example some config settings ?! P.S. At https://github.com/apache/accumulo/issues/1884 I read that Linux ARM64 is a supported platform since the JVM supports it. Thanks! Mark