Jon Haddad created CASSANDRA-21197:
--------------------------------------
Summary: import not importing resulting in data loss with
analytics jobs
Key: CASSANDRA-21197
URL: https://issues.apache.org/jira/browse/CASSANDRA-21197
Project: Apache Cassandra
Issue Type: Bug
Components: Analytics Library
Reporter: Jon Haddad
When evaluating the analytics bulk writer I found jobs were reported as
successful, but the data wasn't being correctly imported. I'm testing using C*
5.0.6, sidecar trunk (as of yesterday), and the latest analytics code as of...
recent.
I've traced the network and filesystem calls and have found this is the series
of events:
1. Spark job runs
2. data lands on disk from sidecar
3. import is called, C* says nothing to import
4. sidecar then deletes the data files
resulting in all my data getting deleted off disk, without import happening. I
have tested this dozens of times a day for almost a week and it's happened 100%
of the time.
I haven't yet determined why Cassandra doesn't import anything, but given the
nature of the issue I'm hoping more eyes on this will help. It's possible
there's something specific about my setup that's causing this issue - I know
there are quite a few tests around sidecar, so I'm surprised it's happening.
That said, if C* isn't correctly importing data, it should have a way of
telling sidecar that so sidecar doesn't delete the results of a bulk write job.
{*}Note{*}: the names of the files might not match up here, I've done this over
several days now with about a dozen clusters and 100 spark jobs.
[Spark job
runs|[https://github.com/rustyrazorblade/easy-db-lab/blob/main/bin/submit-direct-bulk-writer]].
The data files are written to disk, then renamed. I've captured that several
ways, the easiest way to see it is here for the rename, captured with sysdig:
{noformat}
sudo sysdig "evt.category=file and (proc.pid=24272 or proc.pid=30444)" | grep
'cassandra/import'{noformat}
Here's the relevant output, where the vertx process (sidecar) performs the
rename to the expected data file name:
{noformat}
2198732 01:17:36.437748828 1 vert.x-internal (30642) < rename res=0
oldpath=/mnt/db1/cassandra/import/0-0-1d50c5e6-8fbe-44c7-98ec-a06132e78c1f-e9293be0-1767-11f1-887e-0ff1d5cec701/bulk_test/data/oa-1-big-Index.db16346060661306473655.tmp
newpath=/mnt/db1/cassandra/import/0-0-1d50c5e6-8fbe-44c7-98ec-a06132e78c1f-e9293be0-1767-11f1-887e-0ff1d5cec701/bulk_test/data/oa-1-big-Index.db
2199993 01:17:36.450173069 6 vert.x-internal (30635) < rename res=0
oldpath=/mnt/db1/cassandra/import/0-0-1d50c5e6-8fbe-44c7-98ec-a06132e78c1f-e9293be0-1767-11f1-887e-0ff1d5cec701/bulk_test/data/oa-1-big-Filter.db4989982398684709072.tmp
newpath=/mnt/db1/cassandra/import/0-0-1d50c5e6-8fbe-44c7-98ec-a06132e78c1f-e9293be0-1767-11f1-887e-0ff1d5cec701/bulk_test/data/oa-1-big-Filter.db{noformat}
Process 30528 (cassandra) import is called. I captured the filesystem event
where it receives 10 entries:
{noformat}
sudo strace -p 30528 -e trace=getdents64 -y 2>&1 | grep import
getdents64(402</mnt/db1/cassandra/import/0-0-28c91aa3-fcae-4c97-bf5a-e520f070e1f9-a0a1bdd0-176b-11f1-bc8d-55a3317257c0/bulk_test/data>,
0x7176a803a0c0 /* 10 entries */, 32768) = 392{noformat}
but the log entry says nothing is imported:
{noformat}
INFO [RMI TCP Connection(92)-127.0.0.1] 2026-03-04 01:44:12,773
SSTableImporter.java:80 - [af506331-6517-4461-a10f-3846baaf30c6] Loading new
SSTables for bulk_test/data:
Options{srcPaths='[/mnt/db1/cassandra/import/0-0-28c91aa3-fcae-4c97-bf5a-e520f070e1f9-a0a1bdd0-176b-11f1-bc8d-55a3317257c0/bulk_test/data]',
resetLevel=true, clearRepaired=true, verifySSTables=true, verifyTokens=true,
invalidateCaches=true, extendedVerify=false, copyData= false,
failOnMissingIndex= false, validateIndexChecksum= true} INFO [RMI TCP
Connection(92)-127.0.0.1] 2026-03-04 01:44:12,781 SSTableImporter.java:214 -
[af506331-6517-4461-a10f-3846baaf30c6] No new SSTables were found for
bulk_test/data{noformat}
sidecar then comes around and unlinks the files, resulting in data loss:
{noformat}
2248856 01:17:37.778334683 1 vert.x-internal (30642) < unlink res=0
path=/mnt/db1/cassandra/import/0-0-1d50c5e6-8fbe-44c7-98ec-a06132e78c1f-e9293be0-1767-11f1-887e-0ff1d5cec701/bulk_test/data/oa-1-big-CompressionInfo.db
2248866 01:17:37.778345865 1 vert.x-internal (30642) < newfstatat res=0
dirfd=-100(AT_FDCWD)
path=/mnt/db1/cassandra/import/0-0-1d50c5e6-8fbe-44c7-98ec-a06132e78c1f-e9293be0-1767-11f1-887e-0ff1d5cec701/bulk_test/data/oa-1-big-Statistics.db
flags=256(AT_SYMLINK_NOFOLLOW)
2248868 01:17:37.778352848 1 vert.x-internal (30642) < newfstatat res=0
dirfd=-100(AT_FDCWD)
path=/mnt/db1/cassandra/import/0-0-1d50c5e6-8fbe-44c7-98ec-a06132e78c1f-e9293be0-1767-11f1-887e-0ff1d5cec701/bulk_test/data/oa-1-big-Statistics.db
flags=256(AT_SYMLINK_NOFOLLOW)
2248875 01:17:37.778370298 1 vert.x-internal (30642) < unlink res=0
path=/mnt/db1/cassandra/import/0-0-1d50c5e6-8fbe-44c7-98ec-a06132e78c1f-e9293be0-1767-11f1-887e-0ff1d5cec701/bulk_test/data/oa-1-big-Statistics.db{noformat}
I haven't yet determined why Cassandra doesn't import the data. It sees the
files in the listing, but there's no additional debug available to identify why
it doesn't consider them valid.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]