[ https://issues.apache.org/jira/browse/CASSANDRA-20490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17952035#comment-17952035 ]
Stefan Miklosovic commented on CASSANDRA-20490: ----------------------------------------------- Bunch of failures but nothing related to this PR at least ... it is what it is. [CASSANDRA-20490|https://github.com/instaclustr/cassandra/tree/CASSANDRA-20490] {noformat} java17_pre-commit_tests ✓ j17_build 10m 21s ✓ j17_cqlsh_dtests_py311 7m 37s ✓ j17_cqlsh_dtests_py311_vnode 7m 44s ✓ j17_cqlsh_dtests_py38 7m 8s ✓ j17_cqlsh_dtests_py38_vnode 7m 27s ✓ j17_cqlshlib_cython_tests 12m 50s ✓ j17_cqlshlib_tests 9m 20s ✓ j17_dtests_latest 43m 44s ✓ j17_dtests_vnode 43m 36s ✓ j17_jvm_dtests_latest_vnode_repeat 1h 15m 25s ✓ j17_jvm_dtests_repeat 1h 15m 28s ✓ j17_unit_tests_repeat 7m 29s ✓ j17_utests_latest_repeat 8m 26s ✓ j17_utests_oa_repeat 7m 34s ✕ j17_dtests 51m 5s refresh_test.TestRefresh test_refresh_deadlock_startup ✕ j17_jvm_dtests 30m 18s org.apache.cassandra.fuzz.topology.HarryOnAccordTopologyMixupTest test org.apache.cassandra.fuzz.sai.MultiNodeSAITest indexOnlySaiTest TIMEOUTED ✕ j17_jvm_dtests_latest_vnode 30m 58s junit.framework.TestSuite org.apache.cassandra.distributed.test.accord.MigrationFromAccordReadRaceTest junit.framework.TestSuite org.apache.cassandra.distributed.test.accord.MigrationFromAccordReadRaceTest junit.framework.TestSuite org.apache.cassandra.distributed.test.accord.MigrationFromAccordWriteRaceTest junit.framework.TestSuite org.apache.cassandra.distributed.test.accord.MigrationFromAccordWriteRaceTest junit.framework.TestSuite org.apache.cassandra.distributed.test.accord.MigrationToAccordWriteRaceTest junit.framework.TestSuite org.apache.cassandra.distributed.test.accord.MigrationToAccordWriteRaceTest junit.framework.TestSuite org.apache.cassandra.distributed.test.accord.AccordMigrationTest junit.framework.TestSuite org.apache.cassandra.distributed.test.accord.AccordMigrationTest ✕ j17_unit_tests 17m 49s org.apache.cassandra.io.sstable.SSTableReaderTest testSpannedIndexPositions TIMEOUTED ✕ j17_utests_latest 18m 20s org.apache.cassandra.db.lifecycle.LogTransactionTest testStatsTSMismatchDuringStart org.apache.cassandra.db.lifecycle.LogTransactionTest testWrongTimestampInTxnFile org.apache.cassandra.db.lifecycle.LogTransactionTest testStatsTSMismatchDuringList ✕ j17_utests_oa 17m 54s org.apache.cassandra.io.sstable.SSTableReaderTest testSpannedIndexPositions TIMEOUTED java11_pre-commit_tests ✓ j11_build 10m 24s ✓ j11_cqlsh_dtests_py311 9m 1s ✓ j11_cqlsh_dtests_py311_vnode 8m 38s ✓ j11_cqlsh_dtests_py38 7m 3s ✓ j11_cqlsh_dtests_py38_vnode 8m 15s ✓ j11_cqlshlib_cython_tests 9m 50s ✓ j11_cqlshlib_tests 11m 35s ✓ j11_dtests_latest 45m 51s ✓ j11_dtests_vnode 42m 30s ✓ j11_jvm_dtests_latest_vnode_repeat 1h 15m 47s ✓ j11_jvm_dtests_repeat 1h 13m 41s ✓ j11_unit_tests_repeat 8m 42s ✓ j11_utests_latest_repeat 8m 26s ✓ j11_utests_oa_repeat 7m 44s ✓ j11_utests_system_keyspace_directory_repeat 7m 56s ✓ j17_cqlsh_dtests_py311 6m 50s ✓ j17_cqlsh_dtests_py311_vnode 7m 32s ✓ j17_cqlsh_dtests_py38 6m 55s ✓ j17_cqlsh_dtests_py38_vnode 7m 38s ✓ j17_cqlshlib_cython_tests 9m 25s ✓ j17_cqlshlib_tests 8m 42s ✓ j17_dtests_latest 42m 30s ✓ j17_dtests_vnode 45m 39s ✓ j17_jvm_dtests_latest_vnode_repeat 1h 13m 36s ✓ j17_jvm_dtests_repeat 1h 13m 16s ✓ j17_unit_tests_repeat 8m 4s ✓ j17_utests_latest_repeat 8m 15s ✓ j17_utests_oa_repeat 11m 45s j11_dtests 50m 51s ✕ j11_jvm_dtests 31m 59s org.apache.cassandra.fuzz.sai.MultiNodeSAITest indexOnlySaiTest TIMEOUTED ✕ j11_jvm_dtests_latest_vnode 32m 27s junit.framework.TestSuite org.apache.cassandra.distributed.test.accord.MigrationFromAccordReadRaceTest junit.framework.TestSuite org.apache.cassandra.distributed.test.accord.MigrationFromAccordReadRaceTest junit.framework.TestSuite org.apache.cassandra.distributed.test.accord.MigrationFromAccordWriteRaceTest junit.framework.TestSuite org.apache.cassandra.distributed.test.accord.MigrationFromAccordWriteRaceTest junit.framework.TestSuite org.apache.cassandra.distributed.test.accord.MigrationToAccordWriteRaceTest junit.framework.TestSuite org.apache.cassandra.distributed.test.accord.MigrationToAccordWriteRaceTest junit.framework.TestSuite org.apache.cassandra.distributed.test.accord.AccordMigrationTest junit.framework.TestSuite org.apache.cassandra.distributed.test.accord.AccordMigrationTest org.apache.cassandra.distributed.test.cql3.MixedReadsAccordInteropMultiNodeTokenConflictTest test ✕ j11_simulator_dtests 20m 16s org.apache.cassandra.simulator.test.AccordHarrySimulationTest test org.apache.cassandra.simulator.test.HarrySimulatorTest test ✕ j11_unit_tests 20m 8s org.apache.cassandra.io.sstable.SSTableReaderTest testSpannedIndexPositions TIMEOUTED org.apache.cassandra.transport.AuthMessageSizeLimitTest sendTooBigAuthMultiFrameMessage ✕ j11_utests_latest 19m 26s org.apache.cassandra.db.lifecycle.LogTransactionTest testStatsTSMismatchDuringStart org.apache.cassandra.db.lifecycle.LogTransactionTest testWrongTimestampInTxnFile org.apache.cassandra.db.lifecycle.LogTransactionTest testStatsTSMismatchDuringList ✕ j11_utests_oa 19m 11s org.apache.cassandra.io.sstable.SSTableReaderTest testSpannedIndexPositions TIMEOUTED ✕ j11_utests_system_keyspace_directory 21m 2s org.apache.cassandra.io.sstable.SSTableReaderTest testSpannedIndexPositions TIMEOUTED ✕ j17_dtests 38m 16s refresh_test.TestRefresh test_refresh_deadlock_startup ✕ j17_jvm_dtests 30m 35s org.apache.cassandra.fuzz.sai.MultiNodeSAITest indexOnlySaiTest TIMEOUTED ✕ j17_jvm_dtests_latest_vnode 30m 7s junit.framework.TestSuite org.apache.cassandra.distributed.test.accord.MigrationFromAccordReadRaceTest junit.framework.TestSuite org.apache.cassandra.distributed.test.accord.MigrationFromAccordReadRaceTest junit.framework.TestSuite org.apache.cassandra.distributed.test.accord.MigrationFromAccordWriteRaceTest junit.framework.TestSuite org.apache.cassandra.distributed.test.accord.MigrationFromAccordWriteRaceTest junit.framework.TestSuite org.apache.cassandra.distributed.test.accord.MigrationToAccordWriteRaceTest junit.framework.TestSuite org.apache.cassandra.distributed.test.accord.MigrationToAccordWriteRaceTest junit.framework.TestSuite org.apache.cassandra.distributed.test.accord.AccordMigrationTest junit.framework.TestSuite org.apache.cassandra.distributed.test.accord.AccordMigrationTest org.apache.cassandra.fuzz.sai.MultiNodeSAITest indexOnlySaiTest TIMEOUTED ✕ j17_unit_tests 18m 5s org.apache.cassandra.net.ConnectionTest testTimeout org.apache.cassandra.io.sstable.SSTableReaderTest testSpannedIndexPositions TIMEOUTED ✕ j17_utests_latest 18m 10s org.apache.cassandra.db.lifecycle.LogTransactionTest testStatsTSMismatchDuringStart org.apache.cassandra.db.lifecycle.LogTransactionTest testWrongTimestampInTxnFile org.apache.cassandra.db.lifecycle.LogTransactionTest testStatsTSMismatchDuringList ✕ j17_utests_oa 19m 2s org.apache.cassandra.io.sstable.SSTableReaderTest testSpannedIndexPositions TIMEOUTED {noformat} [java17_pre-commit_tests|https://app.circleci.com/pipelines/github/instaclustr/cassandra/5805/workflows/036728ed-e5b0-4e89-8cb0-c9e98d57ec4d] [java11_pre-commit_tests|https://app.circleci.com/pipelines/github/instaclustr/cassandra/5805/workflows/d6df2016-76b9-489f-b445-ff736fd159c1] > Encountred "duplicate hardlink error" when repairing > ---------------------------------------------------- > > Key: CASSANDRA-20490 > URL: https://issues.apache.org/jira/browse/CASSANDRA-20490 > Project: Apache Cassandra > Issue Type: Bug > Components: Consistency/Repair > Reporter: Stefan Miklosovic > Assignee: Stefan Miklosovic > Priority: Normal > Fix For: 5.x > > Time Spent: 2h 10m > Remaining Estimate: 0h > > A user reported: > Hi all, > we experience the following issue when executing full sequential repairs in > Cassandra 4.0.10. > ERROR [RepairSnapshotExecutor:1] 2023-11-07 13:22:50,267 > CassandraDaemon.java:581 - Exception in thread > Thread[RepairSnapshotExecutor:1,5,main] > java.lang.RuntimeException: Tried to create duplicate hard link to > /opt/ddb/data/pool/data1/test_keyspace/test1-c4b33340f0a211edb0cb2fb04a4be304/snapshots/bec3dba0-7d70-11ee-99d3-7bda513c2b90/nb-1-big-Filter.db > at org.apache.cassandra.io.util.FileUtils.createHardLink(FileUtils.java:185) > at > org.apache.cassandra.io.sstable.format.SSTableReader.createLinks(SSTableReader.java:1624) > at > org.apache.cassandra.io.sstable.format.SSTableReader.createLinks(SSTableReader.java:1606) > at > org.apache.cassandra.db.ColumnFamilyStore.snapshotWithoutFlush(ColumnFamilyStore.java:1852) > at > org.apache.cassandra.db.ColumnFamilyStore.snapshot(ColumnFamilyStore.java:2031) > at > org.apache.cassandra.db.ColumnFamilyStore.snapshot(ColumnFamilyStore.java:2017) > at > org.apache.cassandra.db.repair.CassandraTableRepairManager.lambda$snapshot$0(CassandraTableRepairManager.java:74) > at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) > at java.util.concurrent.FutureTask.run(Unknown Source) > at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.lang.Thread.run(Unknown Source) > ERROR [AntiEntropyStage:1] 2023-11-07 13:22:50,267 CassandraDaemon.java:581 - > Exception in thread Thread[AntiEntropyStage:1,5,main] > java.lang.RuntimeException: java.lang.RuntimeException: Unable to take a > snapshot bec3dba0-7d70-11ee-99d3-7bda513c2b90 on test_keyspace/test1 > This behavior is reproduced consistently, when the following are true: > * It is a normal sequential repair (--full and --sequential), > * It is not a global repair, meaning at least one datacenter is defined > (--in-dc or --in-local-dc), > * The repair affects more than two Cassandra nodes. > For more than two Cassandra nodes the parent repair session consists of > multiple separate repair sessions towards different target endpoints. Full > sequential repairs require that all participants flush and snapshot the data > before starting the repair. Unfortunately, there is a collision between the > separate repair sessions. The first one creates the ephemeral snapshot > successfully and the second one that tries to create a snapshot (create hard > link) in the same node fails with the above error. > This issue is not seen in global repairs, where datacenters and hosts are not > defined, because in that case there is an explicit check if a snapshot > already exists before proceeding. > I found a few issues in Jira about duplicate hard links, but all of them are > from older versions and seem irrelevant to this one. Could you please help > with this issue? > Thank you, > Panagiotis > https://lists.apache.org/thread/kwz89po5gkx68bhof7l7o0yykz48bnbw -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org