Re: counter cache loading very slow

2021-04-28 Thread Gil Ganz
I can see the amount of time from last log before counter cache loading
message is very long ,and it also says so in the log.
INFO  [pool-5-thread-1] 2021-04-25 09:56:56,849 AutoSavingCache.java:174 -
Completed loading (2602567 ms; 1277885 keys) CounterCache cache

I didn't have a chance to test with the counter cache disabled yet, we want
to give the system to function properly after a few days of encountering
many other issues.


On Tue, Apr 27, 2021 at 4:33 AM Kane Wilson  wrote:

> Sounds like you're potentially hitting a bug, maybe even one that hasn't
> been hit before. How are you determining it's counters that are the
> problem? Is it stalling on the Initializing counters log line or something?
>
> raft.so - Cassandra consulting, support, and managed services
>
>
> On Mon, Apr 26, 2021 at 3:25 AM Gil Ganz  wrote:
>
>> Hey
>> I have a cluster, 3.11.6, startup is very slow, i3en.xlarge server with
>> about 1tb of data, takes 45 minutes to startup, almost 40 minutes of that
>> is loading the saved counter cache from disk (200mb), and I can see that in
>> these 40 minutes the amount of data read from disk is very high, up to
>> 700MB/s. Counters is a feature that is used heavily in this environment,
>> including the main table in the db, which is half the data size.
>>
>> What can cause such a slow load of such a small cache? Is there something
>> that can be done to make this quicker?
>>
>> Gil
>>
>


Re: Question about the num_tokens

2021-04-28 Thread onmstester onmstester
Some posts/papers discusses this in more detail. for example the one from 
thelastpickle:

https://thelastpickle.com/blog/2019/02/21/set-up-a-cluster-with-even-token-distribution.html

Which says:

Using statistical computation, the point where all clusters of any size always 
had a good token range balance was when 256 vnodes were used. Hence, the 
num_tokens default value of 256 was the recommended by the community to prevent 
hot spots in a cluster. The problem here is that the performance for operations 
requiring token-range scans (e.g. repairs, Spark operations) will tank big 
time. It can also cause problems with bootstrapping due to large numbers of 
SSTables generated. Furthermore, as Joseph Lynch and Josh Snyder pointed out in 
a 
http://mail-archives.apache.org/mod_mbox/cassandra-dev/201804.mbox/%3CCALShVHcz5PixXFO_4bZZZNnKcrpph-=5QmCyb0M=w-mhdyl...@mail.gmail.com%3E
 they wrote, the higher the value of num_tokens in large clusters, the higher 
the risk of data unavailability .





Sent using https://www.zoho.com/mail/







 On Wed, 28 Apr 2021 10:43:35 +0430 Jai Bheemsen Rao Dhanwada 
 wrote 


Thank you,

Is there a specific reason why Cassandra4.0 recommends to use 16 tokens?



On Tue, Apr 27, 2021 at 11:11 PM Jeff Jirsa  wrote:



On Apr 27, 2021, at 10:47 PM, Jai Bheemsen Rao Dhanwada 
 wrote:



Hello,


I am currently using num_tokens: 256 in my cluster with the version 3.11.6 and 
when I looked at the Cassandra4.0 I see the num_tokens set to 16. 



Is there a specific reason for changing the default value from 256 to 16? 

What is the best value to use ?






Probably something like 16 on new clusters. If you have an existing cluster, 
it’s likely not worth the hassle to change it unless it’s actively causing you 
pain 


If 16 is recommended, is there a way to change the num_tokens to 16 from 256 on 
the live production cluster?






Not easily, no. You have to add a new data center or similar. Lots of effort. 






I tried to directly update and restart Cassandra but the process won't startup 
with the below error



org.apache.cassandra.exceptions.ConfigurationException: Cannot change the 
number of tokens from 256 to 16



Any suggestions? 





Change the yaml back to 256 so it starts

Cassandra 4.0 and python

2021-04-28 Thread Paul Chandler
Hi all,

We have been testing with 4.0~beta2 in our setup for a few weeks and all has 
gone very smoothly, however when tried to install 4.0~rc1 we ran into problems 
with python versions.

We are on Ubuntu 16.04.7 LTS so use apt to install Cassandra, and this now 
gives the following error:

The following packages have unmet dependencies:
 cassandra : Depends: python3 (>= 3.6) but 3.5.1-3 is to be installed
E: Unable to correct problems, you have held broken packages.

Looking at the apt packaging the requirement for python has changed from 2.7 to 
3.6 between beta4 and rc1. 

I have found https://issues.apache.org/jira/browse/CASSANDRA-16396 
 which says it needed to 
be python 3.6, however reading this ticket this seems to imply 2.7 is still 
supported https://issues.apache.org/jira/browse/CASSANDRA-15659 


Also the code for for cqlsh says it supports 2.7 as well:  
https://github.com/apache/cassandra/blob/b0c50c10dbc443a05662b111a971a65cafa258d5/bin/cqlsh#L65
 


All our clusters are currently on Ubuntu 16.04 which does not come with python 
3.6, so this is going to be a major pain to upgrade them to 4.0.

Does the apt packaging really need to specify 3.6 ?

Thanks 

Paul Chandler

Cassandra doesn't flush any commit log files into cdc_raw directory

2021-04-28 Thread Bingqin Zhou
Hi all,

We're working on a Kafka connector to capture data changes in Cassandra by
processing commit log files in the cdc_raw directory. After we enabled CDC
on a few tables, we didn't observe any commit log files getting flushed
into cdc_raw directory as expected, but got WriteTimeoutException in
Cassandra DB.

Here's how we reproduce the issue:

1. Our Cassandra Settings:

- Cassandra Version: 3.11.9
- Related configs in Cassandra.yaml:
   - cdc_enabled: true
   - cdc_total_space_in_mb: 4096
   - commitlog_segment_size_in_mb: 32mb
   - commitlog_total_space_in_mb: 8192
   - commitlog_sync: periodic
   - commitlog_sync_period_in_ms: 1

2. Enable CDC on a few tables by CQL:
  ALTER TABLE foo WITH cdc=true;

3. After a few days, we get *WriteTimeoutException* in Cassandra DB.
However at the same time, cdc_raw directory is still empty with no commit
log flushed/copied into it at all.

I want to understand why there's no commit log file flushed into
cdc_raw directory at all even when the threshold cdc_total_space_in_mb has
been reached and write suspension has been triggered in Cassandra DB. This
sounds like a bug and currently makes the CDC feature useless.

Thanks so much,
Bingqin Zhou


tablehistogram shows high sstables

2021-04-28 Thread Ayub M
The table has 24 sstables with size tiered compaction, when I run nodetool
tablehistograms I see 99% percentile of the queries are showing up 24 as
the number of sstables. But the read latency is very low, my understanding
from the tableshistograms's sstable column is - it shows how many sstables
were read to complete the query. If so reading 24 sstables should take
sometime, at least maybe couple of seconds. Am I missing something here?
Does checking against index/bloom filters count towards sstable counter as
well?

Percentile  SSTables Write Latency  Read LatencyPartition
SizeCell Count
  (micros)  (micros)   (bytes)
50%24.00 17.08  17436.92
310 6
75%24.00 24.60  20924.30
446 6
95%24.00 42.51  62479.63
77010
98%24.00 51.01  74975.55
159717
99%24.00 61.21  74975.55
331124
Min18.00  2.30   4866.32
 87 0
Max24.00943.13  89970.66
545791 17084


Re: Cassandra 4.0 and python

2021-04-28 Thread Kane Wilson
No, I suspect the deb package dependencies haven't been updated
correctly, as 2.7 should definitely still work. Could you raise a JIRA for
this issue?

Not sure if apt has some way to force install/ignore dependencies, however
if you do that it may work, otherwise your only workaround would be to
install from the tarball.

raft.so - Cassandra consulting, support, and managed services


On Thu, Apr 29, 2021 at 2:24 AM Paul Chandler  wrote:

> Hi all,
>
> We have been testing with 4.0~beta2 in our setup for a few weeks and all
> has gone very smoothly, however when tried to install 4.0~rc1 we ran into
> problems with python versions.
>
> We are on Ubuntu 16.04.7 LTS so use apt to install Cassandra, and this now
> gives the following error:
>
> The following packages have unmet dependencies:
>  cassandra : Depends: python3 (>= 3.6) but 3.5.1-3 is to be installed
> E: Unable to correct problems, you have held broken packages.
>
> Looking at the apt packaging the requirement for python has changed from
> 2.7 to 3.6 between beta4 and rc1.
>
> I have found https://issues.apache.org/jira/browse/CASSANDRA-16396 which
> says it needed to be python 3.6, however reading this ticket this seems to
> imply 2.7 is still supported
> https://issues.apache.org/jira/browse/CASSANDRA-15659
>
> Also the code for for cqlsh says it supports 2.7 as well:
> https://github.com/apache/cassandra/blob/b0c50c10dbc443a05662b111a971a65cafa258d5/bin/cqlsh#L65
>
> All our clusters are currently on Ubuntu 16.04 which does not come with
> python 3.6, so this is going to be a major pain to upgrade them to 4.0.
>
> Does the apt packaging really need to specify 3.6 ?
>
> Thanks
>
> Paul Chandler
>