Default permissions for /var/lib/cassandra are world-readable

2022-03-22 Thread Sebastian Schulze

Hi all!

After doing some maintenance work on one of our Cassandra notes, I noticed that the default 
permissions for /var/lib/cassandra and everything below seem to be "world readable", e.g. 
"drwxr-xr-x  6 cassandra cassandra".

This might depend on the distribution / package used, but I can at least 
confirm this for the official Cassandra Debian packages as well as the Docker 
containers. Out of curiosity I compared it to Postgres and MySQL to see which 
defaults they would opt for and they are

drwxr-x--- 2 mysql mysql 4.0K Mar 22 10:00  mysql

and respectively

drwx-- 19 postgres postgres 4.0K Mar 22 10:01 data

which is way more appropriate in my option. (See [0] for the Gist and the 
script to test it)

Does anyone know the reasoning for leaving the directories world readable? In 
our own setup we now locked it down to the Cassandra user and group and haven't 
had any problems with it so far.

Best,
 Bascht

[0] https://gist.github.com/bascht/31fa749d4121b9898902d5d557a01f82


Re: Default permissions for /var/lib/cassandra are world-readable

2022-03-22 Thread Paulo Motta
Hi Sebastian,

I'm not aware of any reasoning behind this choice (happy to be corrected),
but I think it wouldn't hurt to have better default permissions.

Feel free to open a JIRA ticket to suggest this change on
https://issues.apache.org/jira/projects/CASSANDRA/summary

Em ter., 22 de mar. de 2022 às 08:10, Sebastian Schulze 
escreveu:

> Hi all!
>
> After doing some maintenance work on one of our Cassandra notes, I noticed
> that the default permissions for /var/lib/cassandra and everything below
> seem to be "world readable", e.g. "drwxr-xr-x  6 cassandra cassandra".
>
> This might depend on the distribution / package used, but I can at least
> confirm this for the official Cassandra Debian packages as well as the
> Docker containers. Out of curiosity I compared it to Postgres and MySQL to
> see which defaults they would opt for and they are
>
> drwxr-x--- 2 mysql mysql 4.0K Mar 22 10:00  mysql
>
> and respectively
>
> drwx-- 19 postgres postgres 4.0K Mar 22 10:01 data
>
> which is way more appropriate in my option. (See [0] for the Gist and the
> script to test it)
>
> Does anyone know the reasoning for leaving the directories world readable?
> In our own setup we now locked it down to the Cassandra user and group and
> haven't had any problems with it so far.
>
> Best,
>   Bascht
>
> [0] https://gist.github.com/bascht/31fa749d4121b9898902d5d557a01f82
>


Re: sstables changing in snapshots

2022-03-22 Thread Paul Chandler
Hi all,

Was there any further progress made on this? Did a Jira get created?

I have been debugging our backup scripts and seem to have found the same 
problem. 

As far as I can work out so far, it seems that this happens when a new snapshot 
is created and the old snapshot is being tarred.

I get a similar message:

/bin/tar: 
var/lib/cassandra/backup/keyspacename/tablename-4eec3b01aba811e896342351775ccc66/snapshots/csbackup_2022-03-22T14\\:04\\:05/nb-523601-big-Data.db:
 file changed as we read it

Thanks 

Paul 



> On 19 Mar 2022, at 02:41, Dinesh Joshi  wrote:
> 
> Do you have a repro that you can share with us? If so, please file a jira and 
> we'll take a look.
> 
>> On Mar 18, 2022, at 12:15 PM, James Brown > > wrote:
>> 
>> This in 4.0.3 after running nodetool snapshot that we're seeing sstables 
>> change, yes.
>> 
>> James Brown
>> Infrastructure Architect @ easypost.com 
>> 
>> On 2022-03-18 at 12:06:00, Jeff Jirsa > > wrote:
>>> This is nodetool snapshot yes? 3.11 or 4.0?
>>> 
>>> In versions prior to 3.0, sstables would be written with -tmp- in the name, 
>>> then renamed when complete, so an sstable definitely never changed once it 
>>> had the final file name. With the new transaction log mechanism, we use one 
>>> name and a transaction log to note what's in flight and what's not, so if 
>>> the snapshot system is including sstables being written (from flush, from 
>>> compaction, or from streaming), those aren't final and should be skipped.
>>> 
>>> 
>>> 
>>> 
>>> On Fri, Mar 18, 2022 at 11:46 AM James Brown >> > wrote:
>>> We use the boring combo of cassandra snapshots + tar to backup our 
>>> cassandra nodes; every once in a while, we'll notice tar failing with the 
>>> following:
>>> 
>>> tar: 
>>> data/addresses/addresses-eb0196100b7d11ec852b1541747d640a/snapshots/backup20220318183708/nb-167-big-Data.db:
>>>  file changed as we read it
>>> 
>>> I find this a bit perplexing; what would cause an sstable inside a snapshot 
>>> to change? The only thing I can think of is an incremental repair changing 
>>> the "repaired_at" flag on the sstable, but it seems like that should 
>>> "un-share" the hardlinked sstable rather than running the risk of mutating 
>>> a snapshot.
>>> 
>>> 
>>> James Brown
>>> Cassandra admin @ easypost.com 



Re: sstables changing in snapshots

2022-03-22 Thread Yifan Cai
I do not think there is a ticket already. Feel free to create one.
https://issues.apache.org/jira/projects/CASSANDRA/issues/

It would be helpful to provide
1. The version of the cassandra
2. The options used for snapshotting

- Yifan

On Tue, Mar 22, 2022 at 9:41 AM Paul Chandler  wrote:

> Hi all,
>
> Was there any further progress made on this? Did a Jira get created?
>
> I have been debugging our backup scripts and seem to have found the same
> problem.
>
> As far as I can work out so far, it seems that this happens when a new
> snapshot is created and the old snapshot is being tarred.
>
> I get a similar message:
>
> /bin/tar:
> var/lib/cassandra/backup/keyspacename/tablename-4eec3b01aba811e896342351775ccc66/snapshots/csbackup_2022-03-22T14\\:04\\:05/nb-523601-big-Data.db:
> file changed as we read it
>
> Thanks
>
> Paul
>
>
>
> On 19 Mar 2022, at 02:41, Dinesh Joshi  wrote:
>
> Do you have a repro that you can share with us? If so, please file a jira
> and we'll take a look.
>
> On Mar 18, 2022, at 12:15 PM, James Brown  wrote:
>
> This in 4.0.3 after running nodetool snapshot that we're seeing sstables
> change, yes.
>
> James Brown
> Infrastructure Architect @ easypost.com
>
>
> On 2022-03-18 at 12:06:00, Jeff Jirsa  wrote:
>
>> This is nodetool snapshot yes? 3.11 or 4.0?
>>
>> In versions prior to 3.0, sstables would be written with -tmp- in the
>> name, then renamed when complete, so an sstable definitely never changed
>> once it had the final file name. With the new transaction log mechanism, we
>> use one name and a transaction log to note what's in flight and what's not,
>> so if the snapshot system is including sstables being written (from flush,
>> from compaction, or from streaming), those aren't final and should be
>> skipped.
>>
>>
>>
>>
>> On Fri, Mar 18, 2022 at 11:46 AM James Brown  wrote:
>>
>>> We use the boring combo of cassandra snapshots + tar to backup our
>>> cassandra nodes; every once in a while, we'll notice tar failing with the
>>> following:
>>>
>>> tar:
>>> data/addresses/addresses-eb0196100b7d11ec852b1541747d640a/snapshots/backup20220318183708/nb-167-big-Data.db:
>>> file changed as we read it
>>>
>>> I find this a bit perplexing; what would cause an sstable inside a
>>> snapshot to change? The only thing I can think of is an incremental repair
>>> changing the "repaired_at" flag on the sstable, but it seems like that
>>> should "un-share" the hardlinked sstable rather than running the risk of
>>> mutating a snapshot.
>>>
>>>
>>> James Brown
>>> Cassandra admin @ easypost.com
>>>
>>
>
>


Re: sstables changing in snapshots

2022-03-22 Thread Jeff Jirsa
The most useful thing that folks can provide is an indication of what was
writing to those data files when you were doing backups.

It's almost certainly one of:
- Memtable flush
- Compaction
- Streaming from repair/move/bootstrap

If you have logs that indicate compaction starting/finishing with those
sstables, or memtable flushing those sstables, or if the .log file is
included in your backup, pasting the contents of that .log file into a
ticket will make this much easier to debug.



On Tue, Mar 22, 2022 at 9:49 AM Yifan Cai  wrote:

> I do not think there is a ticket already. Feel free to create one.
> https://issues.apache.org/jira/projects/CASSANDRA/issues/
>
> It would be helpful to provide
> 1. The version of the cassandra
> 2. The options used for snapshotting
>
> - Yifan
>
> On Tue, Mar 22, 2022 at 9:41 AM Paul Chandler  wrote:
>
>> Hi all,
>>
>> Was there any further progress made on this? Did a Jira get created?
>>
>> I have been debugging our backup scripts and seem to have found the same
>> problem.
>>
>> As far as I can work out so far, it seems that this happens when a new
>> snapshot is created and the old snapshot is being tarred.
>>
>> I get a similar message:
>>
>> /bin/tar:
>> var/lib/cassandra/backup/keyspacename/tablename-4eec3b01aba811e896342351775ccc66/snapshots/csbackup_2022-03-22T14\\:04\\:05/nb-523601-big-Data.db:
>> file changed as we read it
>>
>> Thanks
>>
>> Paul
>>
>>
>>
>> On 19 Mar 2022, at 02:41, Dinesh Joshi  wrote:
>>
>> Do you have a repro that you can share with us? If so, please file a jira
>> and we'll take a look.
>>
>> On Mar 18, 2022, at 12:15 PM, James Brown  wrote:
>>
>> This in 4.0.3 after running nodetool snapshot that we're seeing sstables
>> change, yes.
>>
>> James Brown
>> Infrastructure Architect @ easypost.com
>>
>>
>> On 2022-03-18 at 12:06:00, Jeff Jirsa  wrote:
>>
>>> This is nodetool snapshot yes? 3.11 or 4.0?
>>>
>>> In versions prior to 3.0, sstables would be written with -tmp- in the
>>> name, then renamed when complete, so an sstable definitely never changed
>>> once it had the final file name. With the new transaction log mechanism, we
>>> use one name and a transaction log to note what's in flight and what's not,
>>> so if the snapshot system is including sstables being written (from flush,
>>> from compaction, or from streaming), those aren't final and should be
>>> skipped.
>>>
>>>
>>>
>>>
>>> On Fri, Mar 18, 2022 at 11:46 AM James Brown 
>>> wrote:
>>>
 We use the boring combo of cassandra snapshots + tar to backup our
 cassandra nodes; every once in a while, we'll notice tar failing with the
 following:

 tar:
 data/addresses/addresses-eb0196100b7d11ec852b1541747d640a/snapshots/backup20220318183708/nb-167-big-Data.db:
 file changed as we read it

 I find this a bit perplexing; what would cause an sstable inside a
 snapshot to change? The only thing I can think of is an incremental repair
 changing the "repaired_at" flag on the sstable, but it seems like that
 should "un-share" the hardlinked sstable rather than running the risk of
 mutating a snapshot.


 James Brown
 Cassandra admin @ easypost.com

>>>
>>
>>


Re: sstables changing in snapshots

2022-03-22 Thread Yifan Cai
I am wondering if the cause is tarring when creating hardlinks, i.e.
creating a new snapshot.

A quick experiment on my Mac indicates the file status (ctime) is
updated when creating hardlink.

*➜ *stat -f "Access (atime): %Sa%nModify (mtime): %Sm%nChange (ctime): %Sc"
a

Access (atime): Mar 22 10:03:43 2022

Modify (mtime): Mar 22 10:03:43 2022

Change (ctime): Mar 22 10:05:43 2022

On Tue, Mar 22, 2022 at 10:01 AM Jeff Jirsa  wrote:

> The most useful thing that folks can provide is an indication of what was
> writing to those data files when you were doing backups.
>
> It's almost certainly one of:
> - Memtable flush
> - Compaction
> - Streaming from repair/move/bootstrap
>
> If you have logs that indicate compaction starting/finishing with those
> sstables, or memtable flushing those sstables, or if the .log file is
> included in your backup, pasting the contents of that .log file into a
> ticket will make this much easier to debug.
>
>
>
> On Tue, Mar 22, 2022 at 9:49 AM Yifan Cai  wrote:
>
>> I do not think there is a ticket already. Feel free to create one.
>> https://issues.apache.org/jira/projects/CASSANDRA/issues/
>>
>> It would be helpful to provide
>> 1. The version of the cassandra
>> 2. The options used for snapshotting
>>
>> - Yifan
>>
>> On Tue, Mar 22, 2022 at 9:41 AM Paul Chandler  wrote:
>>
>>> Hi all,
>>>
>>> Was there any further progress made on this? Did a Jira get created?
>>>
>>> I have been debugging our backup scripts and seem to have found the same
>>> problem.
>>>
>>> As far as I can work out so far, it seems that this happens when a new
>>> snapshot is created and the old snapshot is being tarred.
>>>
>>> I get a similar message:
>>>
>>> /bin/tar:
>>> var/lib/cassandra/backup/keyspacename/tablename-4eec3b01aba811e896342351775ccc66/snapshots/csbackup_2022-03-22T14\\:04\\:05/nb-523601-big-Data.db:
>>> file changed as we read it
>>>
>>> Thanks
>>>
>>> Paul
>>>
>>>
>>>
>>> On 19 Mar 2022, at 02:41, Dinesh Joshi  wrote:
>>>
>>> Do you have a repro that you can share with us? If so, please file a
>>> jira and we'll take a look.
>>>
>>> On Mar 18, 2022, at 12:15 PM, James Brown  wrote:
>>>
>>> This in 4.0.3 after running nodetool snapshot that we're seeing
>>> sstables change, yes.
>>>
>>> James Brown
>>> Infrastructure Architect @ easypost.com
>>>
>>>
>>> On 2022-03-18 at 12:06:00, Jeff Jirsa  wrote:
>>>
 This is nodetool snapshot yes? 3.11 or 4.0?

 In versions prior to 3.0, sstables would be written with -tmp- in the
 name, then renamed when complete, so an sstable definitely never changed
 once it had the final file name. With the new transaction log mechanism, we
 use one name and a transaction log to note what's in flight and what's not,
 so if the snapshot system is including sstables being written (from flush,
 from compaction, or from streaming), those aren't final and should be
 skipped.




 On Fri, Mar 18, 2022 at 11:46 AM James Brown 
 wrote:

> We use the boring combo of cassandra snapshots + tar to backup our
> cassandra nodes; every once in a while, we'll notice tar failing with the
> following:
>
> tar:
> data/addresses/addresses-eb0196100b7d11ec852b1541747d640a/snapshots/backup20220318183708/nb-167-big-Data.db:
> file changed as we read it
>
> I find this a bit perplexing; what would cause an sstable inside a
> snapshot to change? The only thing I can think of is an incremental repair
> changing the "repaired_at" flag on the sstable, but it seems like that
> should "un-share" the hardlinked sstable rather than running the risk of
> mutating a snapshot.
>
>
> James Brown
> Cassandra admin @ easypost.com
>

>>>
>>>


Re: sstables changing in snapshots

2022-03-22 Thread Paul Chandler
I will do a few more tests to see if I can pin point what is causing this, then 
I will create a Jira ticket.

This is actually a copy of a cluster that I am testing with, so the only writes 
happening to the cluster are internal ones, so I will be surprised if it is 
compaction or memtable flushes on the offending tables. There could be repairs 
going on on the cluster through.

This is a 4.0.0 cluster, but I think I have the same problem on a 3.11.6 
cluster, but not tested the 3.11.6 version yet.

So I will try and get as much detail together before creating the ticket.

Thanks 

Paul


> On 22 Mar 2022, at 17:01, Jeff Jirsa  wrote:
> 
> The most useful thing that folks can provide is an indication of what was 
> writing to those data files when you were doing backups.
> 
> It's almost certainly one of:
> - Memtable flush 
> - Compaction
> - Streaming from repair/move/bootstrap
> 
> If you have logs that indicate compaction starting/finishing with those 
> sstables, or memtable flushing those sstables, or if the .log file is 
> included in your backup, pasting the contents of that .log file into a ticket 
> will make this much easier to debug.
> 
> 
> 
> On Tue, Mar 22, 2022 at 9:49 AM Yifan Cai  > wrote:
> I do not think there is a ticket already. Feel free to create one. 
> https://issues.apache.org/jira/projects/CASSANDRA/issues/ 
> 
> 
> It would be helpful to provide
> 1. The version of the cassandra
> 2. The options used for snapshotting 
> 
> - Yifan
> 
> On Tue, Mar 22, 2022 at 9:41 AM Paul Chandler  > wrote:
> Hi all,
> 
> Was there any further progress made on this? Did a Jira get created?
> 
> I have been debugging our backup scripts and seem to have found the same 
> problem. 
> 
> As far as I can work out so far, it seems that this happens when a new 
> snapshot is created and the old snapshot is being tarred.
> 
> I get a similar message:
> 
> /bin/tar: 
> var/lib/cassandra/backup/keyspacename/tablename-4eec3b01aba811e896342351775ccc66/snapshots/csbackup_2022-03-22T14\\:04\\:05/nb-523601-big-Data.db:
>  file changed as we read it
> 
> Thanks 
> 
> Paul 
> 
> 
> 
>> On 19 Mar 2022, at 02:41, Dinesh Joshi > > wrote:
>> 
>> Do you have a repro that you can share with us? If so, please file a jira 
>> and we'll take a look.
>> 
>>> On Mar 18, 2022, at 12:15 PM, James Brown >> > wrote:
>>> 
>>> This in 4.0.3 after running nodetool snapshot that we're seeing sstables 
>>> change, yes.
>>> 
>>> James Brown
>>> Infrastructure Architect @ easypost.com 
>>> 
>>> On 2022-03-18 at 12:06:00, Jeff Jirsa >> > wrote:
 This is nodetool snapshot yes? 3.11 or 4.0?
 
 In versions prior to 3.0, sstables would be written with -tmp- in the 
 name, then renamed when complete, so an sstable definitely never changed 
 once it had the final file name. With the new transaction log mechanism, 
 we use one name and a transaction log to note what's in flight and what's 
 not, so if the snapshot system is including sstables being written (from 
 flush, from compaction, or from streaming), those aren't final and should 
 be skipped.
 
 
 
 
 On Fri, Mar 18, 2022 at 11:46 AM James Brown >>> > wrote:
 We use the boring combo of cassandra snapshots + tar to backup our 
 cassandra nodes; every once in a while, we'll notice tar failing with the 
 following:
 
 tar: 
 data/addresses/addresses-eb0196100b7d11ec852b1541747d640a/snapshots/backup20220318183708/nb-167-big-Data.db:
  file changed as we read it
 
 I find this a bit perplexing; what would cause an sstable inside a 
 snapshot to change? The only thing I can think of is an incremental repair 
 changing the "repaired_at" flag on the sstable, but it seems like that 
 should "un-share" the hardlinked sstable rather than running the risk of 
 mutating a snapshot.
 
 
 James Brown
 Cassandra admin @ easypost.com 
> 



Re: sstables changing in snapshots

2022-03-22 Thread Paulo Motta
How does the backup process ensure the snapshot is taken before starting to
upload it ? A snapshot is only safe to use after the "manifest.json" file
is written.

I wonder if the snapshot is being compressed while the snapshot file is
still being created.

Em ter., 22 de mar. de 2022 às 14:17, Paul Chandler 
escreveu:

> I will do a few more tests to see if I can pin point what is causing this,
> then I will create a Jira ticket.
>
> This is actually a copy of a cluster that I am testing with, so the only
> writes happening to the cluster are internal ones, so I will be surprised
> if it is compaction or memtable flushes on the offending tables. There
> could be repairs going on on the cluster through.
>
> This is a 4.0.0 cluster, but I think I have the same problem on a 3.11.6
> cluster, but not tested the 3.11.6 version yet.
>
> So I will try and get as much detail together before creating the ticket.
>
> Thanks
>
> Paul
>
>
> On 22 Mar 2022, at 17:01, Jeff Jirsa  wrote:
>
> The most useful thing that folks can provide is an indication of what was
> writing to those data files when you were doing backups.
>
> It's almost certainly one of:
> - Memtable flush
> - Compaction
> - Streaming from repair/move/bootstrap
>
> If you have logs that indicate compaction starting/finishing with those
> sstables, or memtable flushing those sstables, or if the .log file is
> included in your backup, pasting the contents of that .log file into a
> ticket will make this much easier to debug.
>
>
>
> On Tue, Mar 22, 2022 at 9:49 AM Yifan Cai  wrote:
>
>> I do not think there is a ticket already. Feel free to create one.
>> https://issues.apache.org/jira/projects/CASSANDRA/issues/
>>
>> It would be helpful to provide
>> 1. The version of the cassandra
>> 2. The options used for snapshotting
>>
>> - Yifan
>>
>> On Tue, Mar 22, 2022 at 9:41 AM Paul Chandler  wrote:
>>
>>> Hi all,
>>>
>>> Was there any further progress made on this? Did a Jira get created?
>>>
>>> I have been debugging our backup scripts and seem to have found the same
>>> problem.
>>>
>>> As far as I can work out so far, it seems that this happens when a new
>>> snapshot is created and the old snapshot is being tarred.
>>>
>>> I get a similar message:
>>>
>>> /bin/tar:
>>> var/lib/cassandra/backup/keyspacename/tablename-4eec3b01aba811e896342351775ccc66/snapshots/csbackup_2022-03-22T14\\:04\\:05/nb-523601-big-Data.db:
>>> file changed as we read it
>>>
>>> Thanks
>>>
>>> Paul
>>>
>>>
>>>
>>> On 19 Mar 2022, at 02:41, Dinesh Joshi  wrote:
>>>
>>> Do you have a repro that you can share with us? If so, please file a
>>> jira and we'll take a look.
>>>
>>> On Mar 18, 2022, at 12:15 PM, James Brown  wrote:
>>>
>>> This in 4.0.3 after running nodetool snapshot that we're seeing
>>> sstables change, yes.
>>>
>>> James Brown
>>> Infrastructure Architect @ easypost.com
>>>
>>>
>>> On 2022-03-18 at 12:06:00, Jeff Jirsa  wrote:
>>>
 This is nodetool snapshot yes? 3.11 or 4.0?

 In versions prior to 3.0, sstables would be written with -tmp- in the
 name, then renamed when complete, so an sstable definitely never changed
 once it had the final file name. With the new transaction log mechanism, we
 use one name and a transaction log to note what's in flight and what's not,
 so if the snapshot system is including sstables being written (from flush,
 from compaction, or from streaming), those aren't final and should be
 skipped.




 On Fri, Mar 18, 2022 at 11:46 AM James Brown 
 wrote:

> We use the boring combo of cassandra snapshots + tar to backup our
> cassandra nodes; every once in a while, we'll notice tar failing with the
> following:
>
> tar:
> data/addresses/addresses-eb0196100b7d11ec852b1541747d640a/snapshots/backup20220318183708/nb-167-big-Data.db:
> file changed as we read it
>
> I find this a bit perplexing; what would cause an sstable inside a
> snapshot to change? The only thing I can think of is an incremental repair
> changing the "repaired_at" flag on the sstable, but it seems like that
> should "un-share" the hardlinked sstable rather than running the risk of
> mutating a snapshot.
>
>
> James Brown
> Cassandra admin @ easypost.com
>

>>>
>>>
>


Re: sstables changing in snapshots

2022-03-22 Thread Paul Chandler
Hi Yifan,

It looks like you are right, I can reproduce this, when creating the second 
snapshot the ctime does get updated to the time of the second snapshot.

I guess this is what is causing tar to produce the error.

Paul 

> On 22 Mar 2022, at 17:12, Yifan Cai  wrote:
> 
> I am wondering if the cause is tarring when creating hardlinks, i.e. creating 
> a new snapshot. 
> 
> A quick experiment on my Mac indicates the file status (ctime) is updated 
> when creating hardlink. 
> 
> ➜ stat -f "Access (atime): %Sa%nModify (mtime): %Sm%nChange (ctime): %Sc" a
> Access (atime): Mar 22 10:03:43 2022
> Modify (mtime): Mar 22 10:03:43 2022
> Change (ctime): Mar 22 10:05:43 2022
> 
> On Tue, Mar 22, 2022 at 10:01 AM Jeff Jirsa  > wrote:
> The most useful thing that folks can provide is an indication of what was 
> writing to those data files when you were doing backups.
> 
> It's almost certainly one of:
> - Memtable flush 
> - Compaction
> - Streaming from repair/move/bootstrap
> 
> If you have logs that indicate compaction starting/finishing with those 
> sstables, or memtable flushing those sstables, or if the .log file is 
> included in your backup, pasting the contents of that .log file into a ticket 
> will make this much easier to debug.
> 
> 
> 
> On Tue, Mar 22, 2022 at 9:49 AM Yifan Cai  > wrote:
> I do not think there is a ticket already. Feel free to create one. 
> https://issues.apache.org/jira/projects/CASSANDRA/issues/ 
> 
> 
> It would be helpful to provide
> 1. The version of the cassandra
> 2. The options used for snapshotting 
> 
> - Yifan
> 
> On Tue, Mar 22, 2022 at 9:41 AM Paul Chandler  > wrote:
> Hi all,
> 
> Was there any further progress made on this? Did a Jira get created?
> 
> I have been debugging our backup scripts and seem to have found the same 
> problem. 
> 
> As far as I can work out so far, it seems that this happens when a new 
> snapshot is created and the old snapshot is being tarred.
> 
> I get a similar message:
> 
> /bin/tar: 
> var/lib/cassandra/backup/keyspacename/tablename-4eec3b01aba811e896342351775ccc66/snapshots/csbackup_2022-03-22T14\\:04\\:05/nb-523601-big-Data.db:
>  file changed as we read it
> 
> Thanks 
> 
> Paul 
> 
> 
> 
>> On 19 Mar 2022, at 02:41, Dinesh Joshi > > wrote:
>> 
>> Do you have a repro that you can share with us? If so, please file a jira 
>> and we'll take a look.
>> 
>>> On Mar 18, 2022, at 12:15 PM, James Brown >> > wrote:
>>> 
>>> This in 4.0.3 after running nodetool snapshot that we're seeing sstables 
>>> change, yes.
>>> 
>>> James Brown
>>> Infrastructure Architect @ easypost.com 
>>> 
>>> On 2022-03-18 at 12:06:00, Jeff Jirsa >> > wrote:
 This is nodetool snapshot yes? 3.11 or 4.0?
 
 In versions prior to 3.0, sstables would be written with -tmp- in the 
 name, then renamed when complete, so an sstable definitely never changed 
 once it had the final file name. With the new transaction log mechanism, 
 we use one name and a transaction log to note what's in flight and what's 
 not, so if the snapshot system is including sstables being written (from 
 flush, from compaction, or from streaming), those aren't final and should 
 be skipped.
 
 
 
 
 On Fri, Mar 18, 2022 at 11:46 AM James Brown >>> > wrote:
 We use the boring combo of cassandra snapshots + tar to backup our 
 cassandra nodes; every once in a while, we'll notice tar failing with the 
 following:
 
 tar: 
 data/addresses/addresses-eb0196100b7d11ec852b1541747d640a/snapshots/backup20220318183708/nb-167-big-Data.db:
  file changed as we read it
 
 I find this a bit perplexing; what would cause an sstable inside a 
 snapshot to change? The only thing I can think of is an incremental repair 
 changing the "repaired_at" flag on the sstable, but it seems like that 
 should "un-share" the hardlinked sstable rather than running the risk of 
 mutating a snapshot.
 
 
 James Brown
 Cassandra admin @ easypost.com 
> 



Re: sstables changing in snapshots

2022-03-22 Thread James Brown
 There are not overlapping snapshots, so I don't think it's a second
snapshot. There *are* overlapping repairs.

How does the backup process ensure the snapshot is taken before starting to
> upload it ?
>

It just runs nice nodetool ${jmx_args[@]} snapshot -t "$TAG" ${keyspaces[@]}

A snapshot is only safe to use after the "manifest.json" file is written.
>

Is this true? I don't see this *anywhere* in the documentation for
Cassandra (I would expect it on the Backups page, for example) or in the
help of nodetool snapshot. It was my understanding that when the nodetool
snapshot process finished, the snapshot was done. If that's wrong, it
definitely could be that we're just jumping the gun.

James Brown
Infrastructure Architect @ easypost.com


On 2022-03-22 at 10:38:56, Paul Chandler  wrote:

> Hi Yifan,
>
> It looks like you are right, I can reproduce this, when creating the
> second snapshot the ctime does get updated to the time of the second
> snapshot.
>
> I guess this is what is causing tar to produce the error.
>
> Paul
>
> On 22 Mar 2022, at 17:12, Yifan Cai  wrote:
>
> I am wondering if the cause is tarring when creating hardlinks, i.e.
> creating a new snapshot.
>
> A quick experiment on my Mac indicates the file status (ctime) is
> updated when creating hardlink.
>
> *➜ *stat -f "Access (atime): %Sa%nModify (mtime): %Sm%nChange (ctime):
> %Sc" a
> Access (atime): Mar 22 10:03:43 2022
> Modify (mtime): Mar 22 10:03:43 2022
> Change (ctime): Mar 22 10:05:43 2022
>
> On Tue, Mar 22, 2022 at 10:01 AM Jeff Jirsa  wrote:
>
>> The most useful thing that folks can provide is an indication of what was
>> writing to those data files when you were doing backups.
>>
>> It's almost certainly one of:
>> - Memtable flush
>> - Compaction
>> - Streaming from repair/move/bootstrap
>>
>> If you have logs that indicate compaction starting/finishing with those
>> sstables, or memtable flushing those sstables, or if the .log file is
>> included in your backup, pasting the contents of that .log file into a
>> ticket will make this much easier to debug.
>>
>>
>>
>> On Tue, Mar 22, 2022 at 9:49 AM Yifan Cai  wrote:
>>
>>> I do not think there is a ticket already. Feel free to create one.
>>> https://issues.apache.org/jira/projects/CASSANDRA/issues/
>>>
>>> It would be helpful to provide
>>> 1. The version of the cassandra
>>> 2. The options used for snapshotting
>>>
>>> - Yifan
>>>
>>> On Tue, Mar 22, 2022 at 9:41 AM Paul Chandler  wrote:
>>>
 Hi all,

 Was there any further progress made on this? Did a Jira get created?

 I have been debugging our backup scripts and seem to have found the
 same problem.

 As far as I can work out so far, it seems that this happens when a new
 snapshot is created and the old snapshot is being tarred.

 I get a similar message:

 /bin/tar:
 var/lib/cassandra/backup/keyspacename/tablename-4eec3b01aba811e896342351775ccc66/snapshots/csbackup_2022-03-22T14\\:04\\:05/nb-523601-big-Data.db:
 file changed as we read it

 Thanks

 Paul



 On 19 Mar 2022, at 02:41, Dinesh Joshi  wrote:

 Do you have a repro that you can share with us? If so, please file a
 jira and we'll take a look.

 On Mar 18, 2022, at 12:15 PM, James Brown  wrote:

 This in 4.0.3 after running nodetool snapshot that we're seeing
 sstables change, yes.

 James Brown
 Infrastructure Architect @ easypost.com


 On 2022-03-18 at 12:06:00, Jeff Jirsa  wrote:

> This is nodetool snapshot yes? 3.11 or 4.0?
>
> In versions prior to 3.0, sstables would be written with -tmp- in the
> name, then renamed when complete, so an sstable definitely never changed
> once it had the final file name. With the new transaction log mechanism, 
> we
> use one name and a transaction log to note what's in flight and what's 
> not,
> so if the snapshot system is including sstables being written (from flush,
> from compaction, or from streaming), those aren't final and should be
> skipped.
>
>
>
>
> On Fri, Mar 18, 2022 at 11:46 AM James Brown 
> wrote:
>
>> We use the boring combo of cassandra snapshots + tar to backup our
>> cassandra nodes; every once in a while, we'll notice tar failing with the
>> following:
>>
>> tar:
>> data/addresses/addresses-eb0196100b7d11ec852b1541747d640a/snapshots/backup20220318183708/nb-167-big-Data.db:
>> file changed as we read it
>>
>> I find this a bit perplexing; what would cause an sstable inside a
>> snapshot to change? The only thing I can think of is an incremental 
>> repair
>> changing the "repaired_at" flag on the sstable, but it seems like that
>> should "un-share" the hardlinked sstable rather than running the risk of
>> mutating a snapshot.
>>
>>
>> James Brown
>> Cassandra admin @ easypost.com
>>
>


>

Re: sstables changing in snapshots

2022-03-22 Thread Paulo Motta
> It was my understanding that when the nodetool snapshot process finished,
the snapshot was done.

This is correct. But snapshots could be partially available when using
incremental_backups or snapshot_before_compaction option.

If the compression/upload process starts after nodetool snapshot finishes
then this should be safe.

Em ter., 22 de mar. de 2022 às 20:53, James Brown 
escreveu:

> There are not overlapping snapshots, so I don't think it's a second
> snapshot. There *are* overlapping repairs.
>
> How does the backup process ensure the snapshot is taken before starting
>> to upload it ?
>>
>
> It just runs nice nodetool ${jmx_args[@]} snapshot -t "$TAG"
> ${keyspaces[@]}
>
> A snapshot is only safe to use after the "manifest.json" file is written.
>>
>
> Is this true? I don't see this *anywhere* in the documentation for
> Cassandra (I would expect it on the Backups page, for example) or in the
> help of nodetool snapshot. It was my understanding that when the nodetool
> snapshot process finished, the snapshot was done. If that's wrong, it
> definitely could be that we're just jumping the gun.
>
> James Brown
> Infrastructure Architect @ easypost.com
>
>
> On 2022-03-22 at 10:38:56, Paul Chandler  wrote:
>
>> Hi Yifan,
>>
>> It looks like you are right, I can reproduce this, when creating the
>> second snapshot the ctime does get updated to the time of the second
>> snapshot.
>>
>> I guess this is what is causing tar to produce the error.
>>
>> Paul
>>
>> On 22 Mar 2022, at 17:12, Yifan Cai  wrote:
>>
>> I am wondering if the cause is tarring when creating hardlinks, i.e.
>> creating a new snapshot.
>>
>> A quick experiment on my Mac indicates the file status (ctime) is
>> updated when creating hardlink.
>>
>> *➜ *stat -f "Access (atime): %Sa%nModify (mtime): %Sm%nChange (ctime):
>> %Sc" a
>> Access (atime): Mar 22 10:03:43 2022
>> Modify (mtime): Mar 22 10:03:43 2022
>> Change (ctime): Mar 22 10:05:43 2022
>>
>> On Tue, Mar 22, 2022 at 10:01 AM Jeff Jirsa  wrote:
>>
>>> The most useful thing that folks can provide is an indication of what
>>> was writing to those data files when you were doing backups.
>>>
>>> It's almost certainly one of:
>>> - Memtable flush
>>> - Compaction
>>> - Streaming from repair/move/bootstrap
>>>
>>> If you have logs that indicate compaction starting/finishing with those
>>> sstables, or memtable flushing those sstables, or if the .log file is
>>> included in your backup, pasting the contents of that .log file into a
>>> ticket will make this much easier to debug.
>>>
>>>
>>>
>>> On Tue, Mar 22, 2022 at 9:49 AM Yifan Cai  wrote:
>>>
 I do not think there is a ticket already. Feel free to create one.
 https://issues.apache.org/jira/projects/CASSANDRA/issues/

 It would be helpful to provide
 1. The version of the cassandra
 2. The options used for snapshotting

 - Yifan

 On Tue, Mar 22, 2022 at 9:41 AM Paul Chandler 
 wrote:

> Hi all,
>
> Was there any further progress made on this? Did a Jira get created?
>
> I have been debugging our backup scripts and seem to have found the
> same problem.
>
> As far as I can work out so far, it seems that this happens when a new
> snapshot is created and the old snapshot is being tarred.
>
> I get a similar message:
>
> /bin/tar:
> var/lib/cassandra/backup/keyspacename/tablename-4eec3b01aba811e896342351775ccc66/snapshots/csbackup_2022-03-22T14\\:04\\:05/nb-523601-big-Data.db:
> file changed as we read it
>
> Thanks
>
> Paul
>
>
>
> On 19 Mar 2022, at 02:41, Dinesh Joshi  wrote:
>
> Do you have a repro that you can share with us? If so, please file a
> jira and we'll take a look.
>
> On Mar 18, 2022, at 12:15 PM, James Brown  wrote:
>
> This in 4.0.3 after running nodetool snapshot that we're seeing
> sstables change, yes.
>
> James Brown
> Infrastructure Architect @ easypost.com
>
>
> On 2022-03-18 at 12:06:00, Jeff Jirsa  wrote:
>
>> This is nodetool snapshot yes? 3.11 or 4.0?
>>
>> In versions prior to 3.0, sstables would be written with -tmp- in the
>> name, then renamed when complete, so an sstable definitely never changed
>> once it had the final file name. With the new transaction log mechanism, 
>> we
>> use one name and a transaction log to note what's in flight and what's 
>> not,
>> so if the snapshot system is including sstables being written (from 
>> flush,
>> from compaction, or from streaming), those aren't final and should be
>> skipped.
>>
>>
>>
>>
>> On Fri, Mar 18, 2022 at 11:46 AM James Brown 
>> wrote:
>>
>>> We use the boring combo of cassandra snapshots + tar to backup our
>>> cassandra nodes; every once in a while, we'll notice tar failing with 
>>> the
>>> following:
>>>
>>> tar:
>>> data/addresses/addresses-e

Re: sstables changing in snapshots

2022-03-22 Thread Dinesh Joshi
Cassandra creates hardlinks[1] first and then writes the manifest[2]. But that 
is not the last thing it writes either[3]. This should definitely be 
documented. Could you please open a jira?

[1] 
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/ColumnFamilyStore.java#L1956
[2] 
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/ColumnFamilyStore.java#L1977
[3] 
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/ColumnFamilyStore.java#L1981

> On Mar 22, 2022, at 4:53 PM, James Brown  wrote:
> 
> There are not overlapping snapshots, so I don't think it's a second snapshot. 
> There are overlapping repairs.
> 
>> How does the backup process ensure the snapshot is taken before starting to 
>> upload it ? 
> 
> It just runs nice nodetool ${jmx_args[@]} snapshot -t "$TAG" ${keyspaces[@]}
> 
>> A snapshot is only safe to use after the "manifest.json" file is written.
> 
> Is this true? I don't see this anywhere in the documentation for Cassandra (I 
> would expect it on the Backups page, for example) or in the help of nodetool 
> snapshot. It was my understanding that when the nodetool snapshot process 
> finished, the snapshot was done. If that's wrong, it definitely could be that 
> we're just jumping the gun.
> 
> James Brown
> Infrastructure Architect @ easypost.com
> 
> 
> On 2022-03-22 at 10:38:56, Paul Chandler  wrote:
>> Hi Yifan,
>> 
>> It looks like you are right, I can reproduce this, when creating the second 
>> snapshot the ctime does get updated to the time of the second snapshot.
>> 
>> I guess this is what is causing tar to produce the error.
>> 
>> Paul 
>> 
>>> On 22 Mar 2022, at 17:12, Yifan Cai  wrote:
>>> 
>>> I am wondering if the cause is tarring when creating hardlinks, i.e. 
>>> creating a new snapshot. 
>>> 
>>> A quick experiment on my Mac indicates the file status (ctime) is updated 
>>> when creating hardlink. 
>>> 
>>> ➜ stat -f "Access (atime): %Sa%nModify (mtime): %Sm%nChange (ctime): %Sc" a
>>> Access (atime): Mar 22 10:03:43 2022
>>> Modify (mtime): Mar 22 10:03:43 2022
>>> Change (ctime): Mar 22 10:05:43 2022
>>> 
>>> On Tue, Mar 22, 2022 at 10:01 AM Jeff Jirsa  wrote:
>>> The most useful thing that folks can provide is an indication of what was 
>>> writing to those data files when you were doing backups.
>>> 
>>> It's almost certainly one of:
>>> - Memtable flush 
>>> - Compaction
>>> - Streaming from repair/move/bootstrap
>>> 
>>> If you have logs that indicate compaction starting/finishing with those 
>>> sstables, or memtable flushing those sstables, or if the .log file is 
>>> included in your backup, pasting the contents of that .log file into a 
>>> ticket will make this much easier to debug.
>>> 
>>> 
>>> 
>>> On Tue, Mar 22, 2022 at 9:49 AM Yifan Cai  wrote:
>>> I do not think there is a ticket already. Feel free to create one. 
>>> https://issues.apache.org/jira/projects/CASSANDRA/issues/
>>> 
>>> It would be helpful to provide
>>> 1. The version of the cassandra
>>> 2. The options used for snapshotting 
>>> 
>>> - Yifan
>>> 
>>> On Tue, Mar 22, 2022 at 9:41 AM Paul Chandler  wrote:
>>> Hi all,
>>> 
>>> Was there any further progress made on this? Did a Jira get created?
>>> 
>>> I have been debugging our backup scripts and seem to have found the same 
>>> problem. 
>>> 
>>> As far as I can work out so far, it seems that this happens when a new 
>>> snapshot is created and the old snapshot is being tarred.
>>> 
>>> I get a similar message:
>>> 
>>> /bin/tar: 
>>> var/lib/cassandra/backup/keyspacename/tablename-4eec3b01aba811e896342351775ccc66/snapshots/csbackup_2022-03-22T14\\:04\\:05/nb-523601-big-Data.db:
>>>  file changed as we read it
>>> 
>>> Thanks 
>>> 
>>> Paul 
>>> 
>>> 
>>> 
 On 19 Mar 2022, at 02:41, Dinesh Joshi  wrote:
 
 Do you have a repro that you can share with us? If so, please file a jira 
 and we'll take a look.
 
> On Mar 18, 2022, at 12:15 PM, James Brown  wrote:
> 
> This in 4.0.3 after running nodetool snapshot that we're seeing sstables 
> change, yes.
> 
> James Brown
> Infrastructure Architect @ easypost.com
> 
> 
> On 2022-03-18 at 12:06:00, Jeff Jirsa  wrote:
>> This is nodetool snapshot yes? 3.11 or 4.0?
>> 
>> In versions prior to 3.0, sstables would be written with -tmp- in the 
>> name, then renamed when complete, so an sstable definitely never changed 
>> once it had the final file name. With the new transaction log mechanism, 
>> we use one name and a transaction log to note what's in flight and 
>> what's not, so if the snapshot system is including sstables being 
>> written (from flush, from compaction, or from streaming), those aren't 
>> final and should be skipped.
>> 
>> 
>> 
>> 
>> On Fri, Mar 18, 2022 at 11:46 AM James Brown  wrote:
>> We use the boring combo of cassandra snapshots + tar to backup our 
>> cassandr

Re: sstables changing in snapshots

2022-03-22 Thread James Brown
 I filed https://issues.apache.org/jira/browse/CASSANDRA-17473 for this
thread as a whole.

Would you like a separate Jira issue on the matter of documenting how to
tell when a snapshot is "ready"?

James Brown
Infrastructure Architect @ easypost.com


On 2022-03-22 at 17:41:23, Dinesh Joshi  wrote:

> Cassandra creates hardlinks[1] first and then writes the manifest[2]. But
> that is not the last thing it writes either[3]. This should definitely be
> documented. Could you please open a jira?
>
> [1]
> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/ColumnFamilyStore.java#L1956
> [2]
> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/ColumnFamilyStore.java#L1977
> [3]
> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/ColumnFamilyStore.java#L1981
>
> On Mar 22, 2022, at 4:53 PM, James Brown  wrote:
>
>
> There are not overlapping snapshots, so I don't think it's a second
> snapshot. There are overlapping repairs.
>
>
> > How does the backup process ensure the snapshot is taken before starting
> to upload it ?
>
>
> It just runs nice nodetool ${jmx_args[@]} snapshot -t "$TAG"
> ${keyspaces[@]}
>
>
> > A snapshot is only safe to use after the "manifest.json" file is written.
>
>
> Is this true? I don't see this anywhere in the documentation for Cassandra
> (I would expect it on the Backups page, for example) or in the help of
> nodetool snapshot. It was my understanding that when the nodetool snapshot
> process finished, the snapshot was done. If that's wrong, it definitely
> could be that we're just jumping the gun.
>
>
> James Brown
>
> Infrastructure Architect @ easypost.com
>
>
>
> On 2022-03-22 at 10:38:56, Paul Chandler  wrote:
>
> > Hi Yifan,
>
> >
>
> > It looks like you are right, I can reproduce this, when creating the
> second snapshot the ctime does get updated to the time of the second
> snapshot.
>
> >
>
> > I guess this is what is causing tar to produce the error.
>
> >
>
> > Paul
>
> >
>
> >> On 22 Mar 2022, at 17:12, Yifan Cai  wrote:
>
> >>
>
> >> I am wondering if the cause is tarring when creating hardlinks, i.e.
> creating a new snapshot.
>
> >>
>
> >> A quick experiment on my Mac indicates the file status (ctime) is
> updated when creating hardlink.
>
> >>
>
> >> ➜ stat -f "Access (atime): %Sa%nModify (mtime): %Sm%nChange (ctime):
> %Sc" a
>
> >> Access (atime): Mar 22 10:03:43 2022
>
> >> Modify (mtime): Mar 22 10:03:43 2022
>
> >> Change (ctime): Mar 22 10:05:43 2022
>
> >>
>
> >> On Tue, Mar 22, 2022 at 10:01 AM Jeff Jirsa  wrote:
>
> >> The most useful thing that folks can provide is an indication of what
> was writing to those data files when you were doing backups.
>
> >>
>
> >> It's almost certainly one of:
>
> >> - Memtable flush
>
> >> - Compaction
>
> >> - Streaming from repair/move/bootstrap
>
> >>
>
> >> If you have logs that indicate compaction starting/finishing with those
> sstables, or memtable flushing those sstables, or if the .log file is
> included in your backup, pasting the contents of that .log file into a
> ticket will make this much easier to debug.
>
> >>
>
> >>
>
> >>
>
> >> On Tue, Mar 22, 2022 at 9:49 AM Yifan Cai  wrote:
>
> >> I do not think there is a ticket already. Feel free to create one.
> https://issues.apache.org/jira/projects/CASSANDRA/issues/
>
> >>
>
> >> It would be helpful to provide
>
> >> 1. The version of the cassandra
>
> >> 2. The options used for snapshotting
>
> >>
>
> >> - Yifan
>
> >>
>
> >> On Tue, Mar 22, 2022 at 9:41 AM Paul Chandler 
> wrote:
>
> >> Hi all,
>
> >>
>
> >> Was there any further progress made on this? Did a Jira get created?
>
> >>
>
> >> I have been debugging our backup scripts and seem to have found the
> same problem.
>
> >>
>
> >> As far as I can work out so far, it seems that this happens when a new
> snapshot is created and the old snapshot is being tarred.
>
> >>
>
> >> I get a similar message:
>
> >>
>
> >> /bin/tar:
> var/lib/cassandra/backup/keyspacename/tablename-4eec3b01aba811e896342351775ccc66/snapshots/csbackup_2022-03-22T14\\:04\\:05/nb-523601-big-Data.db:
> file changed as we read it
>
> >>
>
> >> Thanks
>
> >>
>
> >> Paul
>
> >>
>
> >>
>
> >>
>
> >>> On 19 Mar 2022, at 02:41, Dinesh Joshi  wrote:
>
> >>>
>
> >>> Do you have a repro that you can share with us? If so, please file a
> jira and we'll take a look.
>
> >>>
>
>  On Mar 18, 2022, at 12:15 PM, James Brown 
> wrote:
>
> 
>
>  This in 4.0.3 after running nodetool snapshot that we're seeing
> sstables change, yes.
>
> 
>
>  James Brown
>
>  Infrastructure Architect @ easypost.com
>
> 
>
> 
>
>  On 2022-03-18 at 12:06:00, Jeff Jirsa  wrote:
>
> > This is nodetool snapshot yes? 3.11 or 4.0?
>
> >
>
> > In versions prior to 3.0, sstables would be written with -tmp- in
> the name, then renamed when complete, so an sstable definitely never
> changed once it had the final file name. With the new transaction log
>

Cassandra 3.0.14 transport completely blocked

2022-03-22 Thread Jaydeep Chovatia
Hi,

I have been using Cassandra 3.0.14 in production for a long time. Recently
I have found a bug in that, all of a sudden the transport thread-pool
hangs.

*Observation:*
If I do *nodetool tpstats*, then it shows *"Native-Transport-Requests"* is
blocking "Active" tasks. I stopped the complete traffic, and sent a very
light load, but still my requests are getting denied, and active transport
blocked tasks keep happening.

*Fix:*
If I restart my cluster, then everything works fine, which means there
might be some deadlock, etc. in the system.


Is anyone aware of this issue? I know there have been quite a lot of fixes
on top of 3.0.14, is there any specific fix that addresses this particular
issue?

Any help would be appreciated.

Yours Sincerely,
Jaydeep


Re: Cassandra 3.0.14 transport completely blocked

2022-03-22 Thread C. Scott Andreas

Hi Jaydeep, thanks for reaching out.The most notable deadlock identified and resolved in the last few 
years is https://issues.apache.org/jira/browse/CASSANDRA-15367: Memtable memory allocations may deadlock 
(fixed in Apache Cassandra 3.0.21).Mentioning for completeness - since the release of Cassandra 3.0.14 
several years ago, many critical bugs whose consequences include data loss have been resolved. I'd 
strongly recommend upgrading to 3.0.26 - and ideally to 4.0 after you've confirmed behavior is as 
expected on 3.0.26.– ScottOn Mar 22, 2022, at 9:30 PM, Jaydeep Chovatia 
 wrote:Hi,I have been using Cassandra 3.0.14 in production for a long 
time. Recently I have found a bug in that, all of a sudden the transport thread-pool hangs. Observation: 
If I do nodetool tpstats, then it shows "Native-Transport-Requests" is blocking 
"Active" tasks. I stopped the complete traffic, and sent a very light load, but still my 
requests are getting denied, and active transport blocked tasks keep happening.Fix:If I restart my 
cluster, then everything works fine, which means there might be some deadlock, etc. in the system.Is 
anyone aware of this issue? I know there have been quite a lot of fixes on top of 3.0.14, is there any 
specific fix that addresses this particular issue?Any help would be appreciated. Yours Sincerely,Jaydeep

Re: Cassandra 3.0.14 transport completely blocked

2022-03-22 Thread Jaydeep Chovatia
Thanks, Scott, for the prompt response! We will apply this patch and see
how it goes.
Also, in the near future, we will consider upgrading to 3.0.26 and
eventually to 4.0
Thanks a lot!

On Tue, Mar 22, 2022 at 9:45 PM C. Scott Andreas 
wrote:

> Hi Jaydeep, thanks for reaching out.
>
> The most notable deadlock identified and resolved in the last few years is
> https://issues.apache.org/jira/browse/CASSANDRA-15367: Memtable memory
> allocations may deadlock (fixed in Apache Cassandra 3.0.21).
>
> Mentioning for completeness - since the release of Cassandra 3.0.14
> several years ago, many critical bugs whose consequences include data loss
> have been resolved. I'd strongly recommend upgrading to 3.0.26 - and
> ideally to 4.0 after you've confirmed behavior is as expected on 3.0.26.
>
> – Scott
>
> On Mar 22, 2022, at 9:30 PM, Jaydeep Chovatia 
> wrote:
>
>
> Hi,
>
> I have been using Cassandra 3.0.14 in production for a long time. Recently
> I have found a bug in that, all of a sudden the transport thread-pool
> hangs.
>
> *Observation:*
> If I do *nodetool tpstats*, then it shows *"Native-Transport-Requests"*
> is blocking "Active" tasks. I stopped the complete traffic, and sent a very
> light load, but still my requests are getting denied, and active transport
> blocked tasks keep happening.
>
> *Fix:*
> If I restart my cluster, then everything works fine, which means there
> might be some deadlock, etc. in the system.
>
>
> Is anyone aware of this issue? I know there have been quite a lot of fixes
> on top of 3.0.14, is there any specific fix that addresses this particular
> issue?
>
> Any help would be appreciated.
>
> Yours Sincerely,
> Jaydeep
>
>
>
>
>


Re: Cassandra 3.0.14 transport completely blocked

2022-03-22 Thread Erick Ramirez
>
> Thanks, Scott, for the prompt response! We will apply this patch and see
> how it goes.
> Also, in the near future, we will consider upgrading to 3.0.26 and
> eventually to 4.0
>

We would really discourage you from just upgrading to C* 3.0.21. There
really is no logical reason for doing that. If you're going to the trouble
of upgrading the binaries, you might as well go all the way to C* 3.0.26
since it's a prerequisite to eventually upgrading to C* 4.0. Cheers!