[ 
https://issues.apache.org/jira/browse/CASSANDRA-21197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jon Haddad updated CASSANDRA-21197:
-----------------------------------
    Description: 
When evaluating the analytics bulk writer I found jobs were reported as 
successful, but the data wasn't being correctly imported.  I'm testing using C* 
5.0.6, sidecar trunk (as of yesterday), and the latest analytics code as of... 
recent.  I've verified this is an issue with both single token and 4 token 
clusters, using all C* defaults from the tarball release otherwise except these:
{noformat}
---
cluster_name: "test"
num_tokens: 1
seed_provider:
  class_name: "org.apache.cassandra.locator.SimpleSeedProvider"
  parameters:
    seeds: "10.14.1.95"
hints_directory: "/mnt/db1/cassandra/hints"
data_file_directories:
- "/mnt/db1/cassandra/data"
commitlog_directory: "/mnt/db1/cassandra/commitlog"
concurrent_reads: 64
concurrent_writes: 64
trickle_fsync: true
endpoint_snitch: "Ec2Snitch"{noformat}
I've traced the network and filesystem calls and have found this is the series 
of events:

1. Spark job runs
2. data lands on disk from sidecar
3. import is called, C* says nothing to import
4. sidecar then deletes the data files

resulting in all my data getting deleted off disk, without import happening.  I 
have tested this dozens of times a day for almost a week and it's happened 100% 
of the time.

I haven't yet determined why Cassandra doesn't import anything, but given the 
nature of the issue I'm hoping more eyes on this will help.  It's possible 
there's something specific about my setup that's causing this issue - I know 
there are quite a few tests around sidecar, so I'm surprised it's happening. 

That said, if C* isn't correctly importing data, it should have a way of 
telling sidecar that so sidecar doesn't delete the results of a bulk write job.

{*}Note{*}: the names of the files might not match up here, I've done this over 
several days now with about a dozen clusters and 100 spark jobs.

[Spark job 
runs|[https://github.com/rustyrazorblade/easy-db-lab/blob/main/bin/submit-direct-bulk-writer]].
  The data files are written to disk, then renamed.  I've captured that several 
ways, the easiest way to see it is here for the rename, captured with sysdig:
{noformat}
sudo sysdig "evt.category=file and (proc.pid=24272 or proc.pid=30444)" | grep 
'cassandra/import'{noformat}
Here's the relevant output, where the vertx process (sidecar) performs the 
rename to the expected data file name:
{noformat}
2198732 01:17:36.437748828 1 vert.x-internal (30642) < rename res=0 
oldpath=/mnt/db1/cassandra/import/0-0-1d50c5e6-8fbe-44c7-98ec-a06132e78c1f-e9293be0-1767-11f1-887e-0ff1d5cec701/bulk_test/data/oa-1-big-Index.db16346060661306473655.tmp
 
newpath=/mnt/db1/cassandra/import/0-0-1d50c5e6-8fbe-44c7-98ec-a06132e78c1f-e9293be0-1767-11f1-887e-0ff1d5cec701/bulk_test/data/oa-1-big-Index.db

2199993 01:17:36.450173069 6 vert.x-internal (30635) < rename res=0 
oldpath=/mnt/db1/cassandra/import/0-0-1d50c5e6-8fbe-44c7-98ec-a06132e78c1f-e9293be0-1767-11f1-887e-0ff1d5cec701/bulk_test/data/oa-1-big-Filter.db4989982398684709072.tmp
 
newpath=/mnt/db1/cassandra/import/0-0-1d50c5e6-8fbe-44c7-98ec-a06132e78c1f-e9293be0-1767-11f1-887e-0ff1d5cec701/bulk_test/data/oa-1-big-Filter.db{noformat}
 

 

Process 30528 (cassandra) import is called.  I captured the filesystem event 
where it receives 10 entries:
{noformat}
sudo strace -p 30528 -e trace=getdents64 -y 2>&1 | grep import

getdents64(402</mnt/db1/cassandra/import/0-0-28c91aa3-fcae-4c97-bf5a-e520f070e1f9-a0a1bdd0-176b-11f1-bc8d-55a3317257c0/bulk_test/data>,
 0x7176a803a0c0 /* 10 entries */, 32768) = 392{noformat}
but the log entry says nothing is imported:

 
{noformat}
INFO [RMI TCP Connection(92)-127.0.0.1] 2026-03-04 01:44:12,773 
SSTableImporter.java:80 - [af506331-6517-4461-a10f-3846baaf30c6] Loading new 
SSTables for bulk_test/data: 
Options{srcPaths='[/mnt/db1/cassandra/import/0-0-28c91aa3-fcae-4c97-bf5a-e520f070e1f9-a0a1bdd0-176b-11f1-bc8d-55a3317257c0/bulk_test/data]',
 resetLevel=true, clearRepaired=true, verifySSTables=true, verifyTokens=true, 
invalidateCaches=true, extendedVerify=false, copyData= false, 
failOnMissingIndex= false, validateIndexChecksum= true} 

INFO [RMI TCP Connection(92)-127.0.0.1] 2026-03-04 01:44:12,781 
SSTableImporter.java:214 - [af506331-6517-4461-a10f-3846baaf30c6] No new 
SSTables were found for bulk_test/data{noformat}
sidecar then comes around and unlinks the files, resulting in data loss:

 
{noformat}
2248856 01:17:37.778334683 1 vert.x-internal (30642) < unlink res=0 
path=/mnt/db1/cassandra/import/0-0-1d50c5e6-8fbe-44c7-98ec-a06132e78c1f-e9293be0-1767-11f1-887e-0ff1d5cec701/bulk_test/data/oa-1-big-CompressionInfo.db

2248866 01:17:37.778345865 1 vert.x-internal (30642) < newfstatat res=0 
dirfd=-100(AT_FDCWD) 
path=/mnt/db1/cassandra/import/0-0-1d50c5e6-8fbe-44c7-98ec-a06132e78c1f-e9293be0-1767-11f1-887e-0ff1d5cec701/bulk_test/data/oa-1-big-Statistics.db
 flags=256(AT_SYMLINK_NOFOLLOW) 

2248868 01:17:37.778352848 1 vert.x-internal (30642) < newfstatat res=0 
dirfd=-100(AT_FDCWD) 
path=/mnt/db1/cassandra/import/0-0-1d50c5e6-8fbe-44c7-98ec-a06132e78c1f-e9293be0-1767-11f1-887e-0ff1d5cec701/bulk_test/data/oa-1-big-Statistics.db
 flags=256(AT_SYMLINK_NOFOLLOW) 

2248875 01:17:37.778370298 1 vert.x-internal (30642) < unlink res=0 
path=/mnt/db1/cassandra/import/0-0-1d50c5e6-8fbe-44c7-98ec-a06132e78c1f-e9293be0-1767-11f1-887e-0ff1d5cec701/bulk_test/data/oa-1-big-Statistics.db{noformat}
 

I haven't yet determined why Cassandra doesn't import the data.  It sees the 
files in the listing, but there's no additional debug available to identify why 
it doesn't consider them valid.

 

 

  was:
When evaluating the analytics bulk writer I found jobs were reported as 
successful, but the data wasn't being correctly imported.  I'm testing using C* 
5.0.6, sidecar trunk (as of yesterday), and the latest analytics code as of... 
recent.  I've verified this is an issue with both single token and 4 token 
clusters, using all C* defaults otherwise except these:


{noformat}
---
cluster_name: "test"
num_tokens: 1
seed_provider:
  class_name: "org.apache.cassandra.locator.SimpleSeedProvider"
  parameters:
    seeds: "10.14.1.95"
hints_directory: "/mnt/db1/cassandra/hints"
data_file_directories:
- "/mnt/db1/cassandra/data"
commitlog_directory: "/mnt/db1/cassandra/commitlog"
concurrent_reads: 64
concurrent_writes: 64
trickle_fsync: true
endpoint_snitch: "Ec2Snitch"{noformat}
I've traced the network and filesystem calls and have found this is the series 
of events:

1. Spark job runs
2. data lands on disk from sidecar
3. import is called, C* says nothing to import
4. sidecar then deletes the data files

resulting in all my data getting deleted off disk, without import happening.  I 
have tested this dozens of times a day for almost a week and it's happened 100% 
of the time.

I haven't yet determined why Cassandra doesn't import anything, but given the 
nature of the issue I'm hoping more eyes on this will help.  It's possible 
there's something specific about my setup that's causing this issue - I know 
there are quite a few tests around sidecar, so I'm surprised it's happening. 

That said, if C* isn't correctly importing data, it should have a way of 
telling sidecar that so sidecar doesn't delete the results of a bulk write job.

{*}Note{*}: the names of the files might not match up here, I've done this over 
several days now with about a dozen clusters and 100 spark jobs.

[Spark job 
runs|[https://github.com/rustyrazorblade/easy-db-lab/blob/main/bin/submit-direct-bulk-writer]].
  The data files are written to disk, then renamed.  I've captured that several 
ways, the easiest way to see it is here for the rename, captured with sysdig:
{noformat}
sudo sysdig "evt.category=file and (proc.pid=24272 or proc.pid=30444)" | grep 
'cassandra/import'{noformat}
Here's the relevant output, where the vertx process (sidecar) performs the 
rename to the expected data file name:
{noformat}
2198732 01:17:36.437748828 1 vert.x-internal (30642) < rename res=0 
oldpath=/mnt/db1/cassandra/import/0-0-1d50c5e6-8fbe-44c7-98ec-a06132e78c1f-e9293be0-1767-11f1-887e-0ff1d5cec701/bulk_test/data/oa-1-big-Index.db16346060661306473655.tmp
 
newpath=/mnt/db1/cassandra/import/0-0-1d50c5e6-8fbe-44c7-98ec-a06132e78c1f-e9293be0-1767-11f1-887e-0ff1d5cec701/bulk_test/data/oa-1-big-Index.db

2199993 01:17:36.450173069 6 vert.x-internal (30635) < rename res=0 
oldpath=/mnt/db1/cassandra/import/0-0-1d50c5e6-8fbe-44c7-98ec-a06132e78c1f-e9293be0-1767-11f1-887e-0ff1d5cec701/bulk_test/data/oa-1-big-Filter.db4989982398684709072.tmp
 
newpath=/mnt/db1/cassandra/import/0-0-1d50c5e6-8fbe-44c7-98ec-a06132e78c1f-e9293be0-1767-11f1-887e-0ff1d5cec701/bulk_test/data/oa-1-big-Filter.db{noformat}
 

 

Process 30528 (cassandra) import is called.  I captured the filesystem event 
where it receives 10 entries:
{noformat}
sudo strace -p 30528 -e trace=getdents64 -y 2>&1 | grep import

getdents64(402</mnt/db1/cassandra/import/0-0-28c91aa3-fcae-4c97-bf5a-e520f070e1f9-a0a1bdd0-176b-11f1-bc8d-55a3317257c0/bulk_test/data>,
 0x7176a803a0c0 /* 10 entries */, 32768) = 392{noformat}
but the log entry says nothing is imported:

 
{noformat}
INFO [RMI TCP Connection(92)-127.0.0.1] 2026-03-04 01:44:12,773 
SSTableImporter.java:80 - [af506331-6517-4461-a10f-3846baaf30c6] Loading new 
SSTables for bulk_test/data: 
Options{srcPaths='[/mnt/db1/cassandra/import/0-0-28c91aa3-fcae-4c97-bf5a-e520f070e1f9-a0a1bdd0-176b-11f1-bc8d-55a3317257c0/bulk_test/data]',
 resetLevel=true, clearRepaired=true, verifySSTables=true, verifyTokens=true, 
invalidateCaches=true, extendedVerify=false, copyData= false, 
failOnMissingIndex= false, validateIndexChecksum= true} 

INFO [RMI TCP Connection(92)-127.0.0.1] 2026-03-04 01:44:12,781 
SSTableImporter.java:214 - [af506331-6517-4461-a10f-3846baaf30c6] No new 
SSTables were found for bulk_test/data{noformat}
sidecar then comes around and unlinks the files, resulting in data loss:

 
{noformat}
2248856 01:17:37.778334683 1 vert.x-internal (30642) < unlink res=0 
path=/mnt/db1/cassandra/import/0-0-1d50c5e6-8fbe-44c7-98ec-a06132e78c1f-e9293be0-1767-11f1-887e-0ff1d5cec701/bulk_test/data/oa-1-big-CompressionInfo.db

2248866 01:17:37.778345865 1 vert.x-internal (30642) < newfstatat res=0 
dirfd=-100(AT_FDCWD) 
path=/mnt/db1/cassandra/import/0-0-1d50c5e6-8fbe-44c7-98ec-a06132e78c1f-e9293be0-1767-11f1-887e-0ff1d5cec701/bulk_test/data/oa-1-big-Statistics.db
 flags=256(AT_SYMLINK_NOFOLLOW) 

2248868 01:17:37.778352848 1 vert.x-internal (30642) < newfstatat res=0 
dirfd=-100(AT_FDCWD) 
path=/mnt/db1/cassandra/import/0-0-1d50c5e6-8fbe-44c7-98ec-a06132e78c1f-e9293be0-1767-11f1-887e-0ff1d5cec701/bulk_test/data/oa-1-big-Statistics.db
 flags=256(AT_SYMLINK_NOFOLLOW) 

2248875 01:17:37.778370298 1 vert.x-internal (30642) < unlink res=0 
path=/mnt/db1/cassandra/import/0-0-1d50c5e6-8fbe-44c7-98ec-a06132e78c1f-e9293be0-1767-11f1-887e-0ff1d5cec701/bulk_test/data/oa-1-big-Statistics.db{noformat}
 

I haven't yet determined why Cassandra doesn't import the data.  It sees the 
files in the listing, but there's no additional debug available to identify why 
it doesn't consider them valid.

 

 


> import not importing resulting in data loss with analytics jobs
> ---------------------------------------------------------------
>
>                 Key: CASSANDRA-21197
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-21197
>             Project: Apache Cassandra
>          Issue Type: Bug
>          Components: Analytics Library, Sidecar
>            Reporter: Jon Haddad
>            Priority: Normal
>
> When evaluating the analytics bulk writer I found jobs were reported as 
> successful, but the data wasn't being correctly imported.  I'm testing using 
> C* 5.0.6, sidecar trunk (as of yesterday), and the latest analytics code as 
> of... recent.  I've verified this is an issue with both single token and 4 
> token clusters, using all C* defaults from the tarball release otherwise 
> except these:
> {noformat}
> ---
> cluster_name: "test"
> num_tokens: 1
> seed_provider:
>   class_name: "org.apache.cassandra.locator.SimpleSeedProvider"
>   parameters:
>     seeds: "10.14.1.95"
> hints_directory: "/mnt/db1/cassandra/hints"
> data_file_directories:
> - "/mnt/db1/cassandra/data"
> commitlog_directory: "/mnt/db1/cassandra/commitlog"
> concurrent_reads: 64
> concurrent_writes: 64
> trickle_fsync: true
> endpoint_snitch: "Ec2Snitch"{noformat}
> I've traced the network and filesystem calls and have found this is the 
> series of events:
> 1. Spark job runs
> 2. data lands on disk from sidecar
> 3. import is called, C* says nothing to import
> 4. sidecar then deletes the data files
> resulting in all my data getting deleted off disk, without import happening.  
> I have tested this dozens of times a day for almost a week and it's happened 
> 100% of the time.
> I haven't yet determined why Cassandra doesn't import anything, but given the 
> nature of the issue I'm hoping more eyes on this will help.  It's possible 
> there's something specific about my setup that's causing this issue - I know 
> there are quite a few tests around sidecar, so I'm surprised it's happening. 
> That said, if C* isn't correctly importing data, it should have a way of 
> telling sidecar that so sidecar doesn't delete the results of a bulk write 
> job.
> {*}Note{*}: the names of the files might not match up here, I've done this 
> over several days now with about a dozen clusters and 100 spark jobs.
> [Spark job 
> runs|[https://github.com/rustyrazorblade/easy-db-lab/blob/main/bin/submit-direct-bulk-writer]].
>   The data files are written to disk, then renamed.  I've captured that 
> several ways, the easiest way to see it is here for the rename, captured with 
> sysdig:
> {noformat}
> sudo sysdig "evt.category=file and (proc.pid=24272 or proc.pid=30444)" | grep 
> 'cassandra/import'{noformat}
> Here's the relevant output, where the vertx process (sidecar) performs the 
> rename to the expected data file name:
> {noformat}
> 2198732 01:17:36.437748828 1 vert.x-internal (30642) < rename res=0 
> oldpath=/mnt/db1/cassandra/import/0-0-1d50c5e6-8fbe-44c7-98ec-a06132e78c1f-e9293be0-1767-11f1-887e-0ff1d5cec701/bulk_test/data/oa-1-big-Index.db16346060661306473655.tmp
>  
> newpath=/mnt/db1/cassandra/import/0-0-1d50c5e6-8fbe-44c7-98ec-a06132e78c1f-e9293be0-1767-11f1-887e-0ff1d5cec701/bulk_test/data/oa-1-big-Index.db
> 2199993 01:17:36.450173069 6 vert.x-internal (30635) < rename res=0 
> oldpath=/mnt/db1/cassandra/import/0-0-1d50c5e6-8fbe-44c7-98ec-a06132e78c1f-e9293be0-1767-11f1-887e-0ff1d5cec701/bulk_test/data/oa-1-big-Filter.db4989982398684709072.tmp
>  
> newpath=/mnt/db1/cassandra/import/0-0-1d50c5e6-8fbe-44c7-98ec-a06132e78c1f-e9293be0-1767-11f1-887e-0ff1d5cec701/bulk_test/data/oa-1-big-Filter.db{noformat}
>  
>  
> Process 30528 (cassandra) import is called.  I captured the filesystem event 
> where it receives 10 entries:
> {noformat}
> sudo strace -p 30528 -e trace=getdents64 -y 2>&1 | grep import
> getdents64(402</mnt/db1/cassandra/import/0-0-28c91aa3-fcae-4c97-bf5a-e520f070e1f9-a0a1bdd0-176b-11f1-bc8d-55a3317257c0/bulk_test/data>,
>  0x7176a803a0c0 /* 10 entries */, 32768) = 392{noformat}
> but the log entry says nothing is imported:
>  
> {noformat}
> INFO [RMI TCP Connection(92)-127.0.0.1] 2026-03-04 01:44:12,773 
> SSTableImporter.java:80 - [af506331-6517-4461-a10f-3846baaf30c6] Loading new 
> SSTables for bulk_test/data: 
> Options{srcPaths='[/mnt/db1/cassandra/import/0-0-28c91aa3-fcae-4c97-bf5a-e520f070e1f9-a0a1bdd0-176b-11f1-bc8d-55a3317257c0/bulk_test/data]',
>  resetLevel=true, clearRepaired=true, verifySSTables=true, verifyTokens=true, 
> invalidateCaches=true, extendedVerify=false, copyData= false, 
> failOnMissingIndex= false, validateIndexChecksum= true} 
> INFO [RMI TCP Connection(92)-127.0.0.1] 2026-03-04 01:44:12,781 
> SSTableImporter.java:214 - [af506331-6517-4461-a10f-3846baaf30c6] No new 
> SSTables were found for bulk_test/data{noformat}
> sidecar then comes around and unlinks the files, resulting in data loss:
>  
> {noformat}
> 2248856 01:17:37.778334683 1 vert.x-internal (30642) < unlink res=0 
> path=/mnt/db1/cassandra/import/0-0-1d50c5e6-8fbe-44c7-98ec-a06132e78c1f-e9293be0-1767-11f1-887e-0ff1d5cec701/bulk_test/data/oa-1-big-CompressionInfo.db
> 2248866 01:17:37.778345865 1 vert.x-internal (30642) < newfstatat res=0 
> dirfd=-100(AT_FDCWD) 
> path=/mnt/db1/cassandra/import/0-0-1d50c5e6-8fbe-44c7-98ec-a06132e78c1f-e9293be0-1767-11f1-887e-0ff1d5cec701/bulk_test/data/oa-1-big-Statistics.db
>  flags=256(AT_SYMLINK_NOFOLLOW) 
> 2248868 01:17:37.778352848 1 vert.x-internal (30642) < newfstatat res=0 
> dirfd=-100(AT_FDCWD) 
> path=/mnt/db1/cassandra/import/0-0-1d50c5e6-8fbe-44c7-98ec-a06132e78c1f-e9293be0-1767-11f1-887e-0ff1d5cec701/bulk_test/data/oa-1-big-Statistics.db
>  flags=256(AT_SYMLINK_NOFOLLOW) 
> 2248875 01:17:37.778370298 1 vert.x-internal (30642) < unlink res=0 
> path=/mnt/db1/cassandra/import/0-0-1d50c5e6-8fbe-44c7-98ec-a06132e78c1f-e9293be0-1767-11f1-887e-0ff1d5cec701/bulk_test/data/oa-1-big-Statistics.db{noformat}
>  
> I haven't yet determined why Cassandra doesn't import the data.  It sees the 
> files in the listing, but there's no additional debug available to identify 
> why it doesn't consider them valid.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to