Re: Large number of tiny sstables flushed constantly

2021-08-12 Thread Bowen Song

Hello Jiayong,


Using multiple disks in a RAID0 for Cassandra data directory is not 
recommended. You will get better fault tolerance and often better 
performance too with multiple data directories, one on each disk.


If you stick with RAID0, it's not 4 disks, it's 1 from Cassandra's point 
of view, because any read or write operation will have to touch all 4 
member disks. Therefore, 4 flush writers doesn't make much sense.


On the frequent SSTable flush issue, a quick internet search leads me to:

   * an old bug in Cassandra 2.1 - CASSANDRA-8409
    which
   shouldn't affect 3.x at all

   * a StackOverflow question
   

   may be related

Did you run repair? Do you use materialized views?


Regards,

Bowen


On 11/08/2021 15:58, Jiayong Sun wrote:

Hi Erick,

The nodes have 4 SSD (1TB for each but we only use 2.4TB of space. 
Current disk usage is about 50%) with RAID0.
Based on number of disks we increased memtable_flush_writers: 4 
instead of default of 2.


For the following we set:
- max heap size - 31GB
- memtable_heap_space_in_mb (use default)
- memtable_offheap_space_in_mb  (use default)

In the logs, we also noticed system.sstable_activity table has 
hundreds of MB or GB of data and constantly flushing:
DEBUG [NativePoolCleaner]  ColumnFamilyStore.java:932 - 
Enqueuing flush of sstable_activity: 0.293KiB (0%) on-heap, 0.107KiB 
(0%) off-heap
DEBUG [NonPeriodicTasks:1]  SSTable.java:105 - Deleting 
sstable: 
/app/cassandra/data/system/sstable_activity-5a1ff267ace03f128563cfae6103c65e/md-103645-big
DEBUG [NativePoolCleaner]  ColumnFamilyStore.java:1322 - 
Flushing largest CFS(Keyspace='system', 
ColumnFamily='sstable_activity') to free up room. Used total: 
0.06/1.00, live: 0.00/0.00, flushing: 0.02/0.29, this: 0.00/0.00


Thanks,
Jiayong Sun
On Wednesday, August 11, 2021, 12:06:27 AM PDT, Erick Ramirez 
 wrote:



4 flush writers isn't bad since the default is 2. It doesn't make a 
difference if you have fast disks (like NVMe SSDs) because only 1 
thread gets used.


But if flushes are slow, the work gets distributed to 4 flush writers 
so you end up with smaller flush sizes although it's difficult to tell 
how tiny the SSTables would be without analysing the logs and overall 
performance of your cluster.


Was there a specific reason you decided to bump it up to 4? I'm just 
trying to get a sense of why you did it since it might provide some 
clues. Out of curiosity, what do you have set for the following?

- max heap size
- memtable_heap_space_in_mb
- memtable_offheap_space_in_mb



Re: New Servers - Cassandra 4

2021-08-12 Thread Elliott Sims
Depends on your availability requirements, but in general I'd say if you're
going with N replicas, you'd want N failure domains (where one blade
chassis is a failure domain).

On Tue, Aug 10, 2021 at 11:16 PM Erick Ramirez 
wrote:

> That's 430TB of eggs in the one 4U basket so consider that against your
> MTTR requirements. I fully understand the motivation for that kind of
> configuration but *personally*, I wouldn't want to be responsible for its
> day-to-day operation but maybe that's just me. 😁
>


Re: Large number of tiny sstables flushed constantly

2021-08-12 Thread Jiayong Sun
 Hello Bowen,
Thanks for your response.Yes, we are aware of the theory that RAID0 vs 
individual JBOD, but all of our clusters are using this RAID0 configuration 
through Azure, while only on this cluster we see this issue so it's hardly to 
conclude root cause to the disk. This is more like workload related, and we are 
seeking feedback here for any other parameters in the yaml that we could tune 
for this.
Thanks again,Jiayong Sun
On Thursday, August 12, 2021, 04:55:51 AM PDT, Bowen Song  
wrote:  
 
  
Hello Jiayong,
 

 
 
Using multiple disks in a RAID0 for Cassandra data directory is not 
recommended. You will get better fault tolerance and often better performance 
too with multiple data directories, one on each disk.
 
If you stick with RAID0, it's not 4 disks, it's 1 from Cassandra's point of 
view, because any read or write operation will have to touch all 4 member 
disks. Therefore, 4 flush writers doesn't make much sense.
 
On the frequent SSTable flush issue, a quick internet search leads me to:
 
 
* an old bug in Cassandra 2.1 - CASSANDRA-8409 which shouldn't affect 3.x at all
 
* a StackOverflow question may be related
 
 
Did you run repair? Do you use materialized views?
 
 

 
 
Regards,
 
Bowen
 
 

 
 On 11/08/2021 15:58, Jiayong Sun wrote:
  
 
 Hi Erick, 
  The nodes have 4 SSD (1TB for each but we only use 2.4TB of space. Current 
disk usage is about 50%) with RAID0.  Based on number of disks we increased 
memtable_flush_writers: 4 instead of default of 2. 
  For the following we set:   - max heap size - 31GB - 
memtable_heap_space_in_mb (use default) - memtable_offheap_space_in_mb  (use 
default)  
  In the logs, we also noticed system.sstable_activity table has hundreds of MB 
or GB of data and constantly flushing:
  DEBUG [NativePoolCleaner]  ColumnFamilyStore.java:932 - Enqueuing 
flush of sstable_activity: 0.293KiB (0%) on-heap, 0.107KiB (0%) off-heap DEBUG 
[NonPeriodicTasks:1]  SSTable.java:105 - Deleting 
sstable:/app/cassandra/data/system/sstable_activity-5a1ff267ace03f128563cfae6103c65e/md-103645-big
 DEBUG [NativePoolCleaner]  ColumnFamilyStore.java:1322 - Flushing 
largest CFS(Keyspace='system', ColumnFamily='sstable_activity') to free up 
room. Used total: 0.06/1.00, live: 0.00/0.00, flushing: 0.02/0.29, this: 
0.00/0.00 
   Thanks, Jiayong Sun On Wednesday, August 11, 2021, 12:06:27 AM PDT, 
Erick Ramirez  wrote:  
  
   4 flush writers isn't bad since the default is 2. It doesn't make a 
difference if you have fast disks (like NVMe SSDs) because only 1 thread gets 
used. 
  But if flushes are slow, the work gets distributed to 4 flush writers so you 
end up with smaller flush sizes although it's difficult to tell how tiny the 
SSTables would be without analysing the logs and overall performance of your 
cluster. 
  Was there a specific reason you decided to bump it up to 4? I'm just trying 
to get a sense of why you did it since it might provide some clues. Out of 
curiosity, what do you have set for the following? - max heap size - 
memtable_heap_space_in_mb - memtable_offheap_space_in_mb