[ 
https://issues.apache.org/jira/browse/CASSANDRA-21197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jon Haddad updated CASSANDRA-21197:
-----------------------------------
    Summary: nodetool import silently failing resulting in data loss with 
analytics jobs  (was: nodetool import silently not importing resulting in data 
loss with analytics jobs)

> nodetool import silently failing resulting in data loss with analytics jobs
> ---------------------------------------------------------------------------
>
>                 Key: CASSANDRA-21197
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-21197
>             Project: Apache Cassandra
>          Issue Type: Bug
>          Components: Analytics Library, Sidecar
>            Reporter: Jon Haddad
>            Priority: Normal
>
> When evaluating the analytics bulk writer I found jobs were reported as 
> successful, but the data wasn't being correctly imported.  I'm testing using 
> C* 5.0.6, sidecar trunk (as of yesterday), and the latest analytics code as 
> of... recent.  I've verified this is an issue with both single token and 4 
> token clusters, using all C* defaults from the tarball release otherwise 
> except these:
> {noformat}
> ---
> cluster_name: "test"
> num_tokens: 1
> seed_provider:
>   class_name: "org.apache.cassandra.locator.SimpleSeedProvider"
>   parameters:
>     seeds: "10.14.1.95"
> hints_directory: "/mnt/db1/cassandra/hints"
> data_file_directories:
> - "/mnt/db1/cassandra/data"
> commitlog_directory: "/mnt/db1/cassandra/commitlog"
> concurrent_reads: 64
> concurrent_writes: 64
> trickle_fsync: true
> endpoint_snitch: "Ec2Snitch"{noformat}
> I've traced the network and filesystem calls and have found this is the 
> series of events:
> 1. Spark job runs
> 2. data lands on disk from sidecar
> 3. import is called, C* says nothing to import
> 4. sidecar then deletes the data files
> resulting in all my data getting deleted off disk, without import happening.  
> I have tested this dozens of times a day for almost a week and it's happened 
> 100% of the time.
> I haven't yet determined why Cassandra doesn't import anything, but given the 
> nature of the issue I'm hoping more eyes on this will help.  It's possible 
> there's something specific about my setup that's causing this issue - I know 
> there are quite a few tests around sidecar, so I'm surprised it's happening. 
> That said, if C* isn't correctly importing data, it should have a way of 
> telling sidecar that so sidecar doesn't delete the results of a bulk write 
> job.
> {*}Note{*}: the names of the files might not match up here, I've done this 
> over several days now with about a dozen clusters and 100 spark jobs.
> [Spark job 
> runs|[https://github.com/rustyrazorblade/easy-db-lab/blob/main/bin/submit-direct-bulk-writer]].
>   The data files are written to disk, then renamed.  I've captured that 
> several ways, the easiest way to see it is here for the rename, captured with 
> sysdig:
> {noformat}
> sudo sysdig "evt.category=file and (proc.pid=24272 or proc.pid=30444)" | grep 
> 'cassandra/import'{noformat}
> Here's the relevant output, where the vertx process (sidecar) performs the 
> rename to the expected data file name:
> {noformat}
> 2198732 01:17:36.437748828 1 vert.x-internal (30642) < rename res=0 
> oldpath=/mnt/db1/cassandra/import/0-0-1d50c5e6-8fbe-44c7-98ec-a06132e78c1f-e9293be0-1767-11f1-887e-0ff1d5cec701/bulk_test/data/oa-1-big-Index.db16346060661306473655.tmp
>  
> newpath=/mnt/db1/cassandra/import/0-0-1d50c5e6-8fbe-44c7-98ec-a06132e78c1f-e9293be0-1767-11f1-887e-0ff1d5cec701/bulk_test/data/oa-1-big-Index.db
> 2199993 01:17:36.450173069 6 vert.x-internal (30635) < rename res=0 
> oldpath=/mnt/db1/cassandra/import/0-0-1d50c5e6-8fbe-44c7-98ec-a06132e78c1f-e9293be0-1767-11f1-887e-0ff1d5cec701/bulk_test/data/oa-1-big-Filter.db4989982398684709072.tmp
>  
> newpath=/mnt/db1/cassandra/import/0-0-1d50c5e6-8fbe-44c7-98ec-a06132e78c1f-e9293be0-1767-11f1-887e-0ff1d5cec701/bulk_test/data/oa-1-big-Filter.db{noformat}
>  
>  
> Process 30528 (cassandra) import is called.  I captured the filesystem event 
> where it receives 10 entries:
> {noformat}
> sudo strace -p 30528 -e trace=getdents64 -y 2>&1 | grep import
> getdents64(402</mnt/db1/cassandra/import/0-0-28c91aa3-fcae-4c97-bf5a-e520f070e1f9-a0a1bdd0-176b-11f1-bc8d-55a3317257c0/bulk_test/data>,
>  0x7176a803a0c0 /* 10 entries */, 32768) = 392{noformat}
> but the log entry says nothing is imported:
>  
> {noformat}
> INFO [RMI TCP Connection(92)-127.0.0.1] 2026-03-04 01:44:12,773 
> SSTableImporter.java:80 - [af506331-6517-4461-a10f-3846baaf30c6] Loading new 
> SSTables for bulk_test/data: 
> Options{srcPaths='[/mnt/db1/cassandra/import/0-0-28c91aa3-fcae-4c97-bf5a-e520f070e1f9-a0a1bdd0-176b-11f1-bc8d-55a3317257c0/bulk_test/data]',
>  resetLevel=true, clearRepaired=true, verifySSTables=true, verifyTokens=true, 
> invalidateCaches=true, extendedVerify=false, copyData= false, 
> failOnMissingIndex= false, validateIndexChecksum= true} 
> INFO [RMI TCP Connection(92)-127.0.0.1] 2026-03-04 01:44:12,781 
> SSTableImporter.java:214 - [af506331-6517-4461-a10f-3846baaf30c6] No new 
> SSTables were found for bulk_test/data{noformat}
> sidecar then comes around and unlinks the files, resulting in data loss:
>  
> {noformat}
> 2248856 01:17:37.778334683 1 vert.x-internal (30642) < unlink res=0 
> path=/mnt/db1/cassandra/import/0-0-1d50c5e6-8fbe-44c7-98ec-a06132e78c1f-e9293be0-1767-11f1-887e-0ff1d5cec701/bulk_test/data/oa-1-big-CompressionInfo.db
> 2248866 01:17:37.778345865 1 vert.x-internal (30642) < newfstatat res=0 
> dirfd=-100(AT_FDCWD) 
> path=/mnt/db1/cassandra/import/0-0-1d50c5e6-8fbe-44c7-98ec-a06132e78c1f-e9293be0-1767-11f1-887e-0ff1d5cec701/bulk_test/data/oa-1-big-Statistics.db
>  flags=256(AT_SYMLINK_NOFOLLOW) 
> 2248868 01:17:37.778352848 1 vert.x-internal (30642) < newfstatat res=0 
> dirfd=-100(AT_FDCWD) 
> path=/mnt/db1/cassandra/import/0-0-1d50c5e6-8fbe-44c7-98ec-a06132e78c1f-e9293be0-1767-11f1-887e-0ff1d5cec701/bulk_test/data/oa-1-big-Statistics.db
>  flags=256(AT_SYMLINK_NOFOLLOW) 
> 2248875 01:17:37.778370298 1 vert.x-internal (30642) < unlink res=0 
> path=/mnt/db1/cassandra/import/0-0-1d50c5e6-8fbe-44c7-98ec-a06132e78c1f-e9293be0-1767-11f1-887e-0ff1d5cec701/bulk_test/data/oa-1-big-Statistics.db{noformat}
>  
> I haven't yet determined why Cassandra doesn't import the data.  It sees the 
> files in the listing, but there's no additional debug available to identify 
> why it doesn't consider them valid.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to