Re: number of racks in a deployment with VMs

2021-02-15 Thread Kane Wilson
There are operational advantages to having #racks == RF, however it's by no means mandatory. Having more racks than RF doesn't cause any availability/health/balance problems, it is only disadvantageous in that it makes some cluster maintenance tasks more expensive/unwieldy like repairs and DC migra

Re: Snapshots space question

2021-02-16 Thread Kane Wilson
The calculation isn't terribly smart - it simply sums up the size of all the snapshots, which might not be accurate as multiple snapshots may point to the same SSTable as they are hardlinks. It really shouldn't be called TrueDiskSpaceUsed, but for some reason no one is considering it a bug. https:

Re: Understanding logging in Cassandra

2021-02-17 Thread Kane Wilson
You can configure the log level of a specific package through logback, so you can silence any noisy packages by setting their level higher than debug. See http://logback.qos.ch/manual/configuration.html?source=post_page---#rootElement Cheers, Kane raft.so - Cassandra con

Re: Understanding logging in Cassandra

2021-02-17 Thread Kane Wilson
ssages, which are very high in volume. >> >> These messages shouldn't be very high volume as they only appear when >> there are ring updates, schema changes or node startup. If this is not the >> case please file a JIRA issue. >> >> Em qua., 17 de fev. de 2021

Re: Cassandra 4.0 and changing DC setting

2021-02-21 Thread Kane Wilson
There has been proposals to add a force/unsafe flag to alter DC but it hasn't been actioned and at this rate seems unlikely to make it into 4.0. There is however a workaround, albeit not very user friendly. You should be able to modify the system_schema tables directly to do your DC updates. I am y

Re: Cassandra 4.0 and changing DC setting

2021-02-21 Thread Kane Wilson
gt; > Thanks > > Paul > > On 21 Feb 2021, at 11:33, Kane Wilson wrote: > > There has been proposals to add a force/unsafe flag to alter DC but it > hasn't been actioned and at this rate seems unlikely to make it into 4.0. > There is however a workaround, albeit not

Re: Cassandra 4.0 and changing DC setting

2021-02-22 Thread Kane Wilson
gt; > This will then allow you to log in after the nodes come back up, and the > other keyspaces can be changed as normal afterwards. > > Thanks for you help. > > Paul > > > On 21 Feb 2021, at 22:30, Kane Wilson wrote: > > Make sure you test it on a practice cluster. Me

Re: How to restore single Kubernetes node?

2021-02-24 Thread Kane Wilson
Someone will correct me if I'm wrong, but I don't believe any of the Cassandra operators currently support backup and restore. You'll have to perform a traditional node replacement. Cheers, Kane raft.so - Cassandra consulting, support, managed services On Wed., 24 Feb. 2021, 18:31 Pushpendra Raj

Re: Understanding which table had digest mismatch

2021-02-25 Thread Kane Wilson
You should be able to use the Table metric ReadRepairRequests to determine which table has read repairs occuring (fairly sure it's present on 3.11. See https://cassandra.apache.org/doc/latest/operating/metrics.html#table-metrics Cheers, Kane raft.so - Cassandra consulting, support, and managed se

Re: using zstd cause high memtable switch count

2021-02-28 Thread Kane Wilson
Did you also backport https://github.com/apache/cassandra/commit/9c1bbf3ac913f9bdf7a0e0922106804af42d2c1e to still use LZ4 for flushing? I would be curious if this is a side effect of using zstd for flushing. raft.so - Cassandra consulting, support, and managed services On Sun, Feb 28, 2021 at 9

Re: Cassandra on arm aws instances

2021-02-28 Thread Kane Wilson
If anyone has tried it hasn't been publicized. I wouldn't anticipate any real issues because it's all Java, but given we don't test on ARM you should definitely test it out before making the switch in prod. raft.so - Cassandra consulting, support, and managed services On Sun, Feb 28, 2021 at 7:0

Re: Impact analysis of upgrading RHEL/SLES OS

2021-02-28 Thread Kane Wilson
OS upgrades most often don't have any impact on C*, and you'd likely only expect trouble on a major upgrade, but this would typically be related to package installation, rather than any effects on the database itself. I wouldn't be too worried about going from RHEL7.4 to 7.9, however going to 8 you

Re: How to debug node load unbalance

2021-03-03 Thread Kane Wilson
The load calculation always has issues so I wouldn't count on it, although in this case it does seem to roughly line up. Are you sure your ring calculation was accurate? It doesn't really seem to line up with the owns % for the 33% node, and it is feasible (although unlikely) that you could roll a

Re: How to debug node load unbalance

2021-03-03 Thread Kane Wilson
ce in usage. > > All the involved keyspaces have an identical RF of: > {'class': 'NetworkTopologyStrategy', 'Hetzner': 3, 'DR': 1} > > Hypothetically speaking, how would I obtain "the tokens that will > balance the ring"? Are there to

Re: Node removal causes spike in pending native-transport requests and clients suffer

2021-03-09 Thread Kane Wilson
It's unlikely to help in this case, but you should be using nodetool decommission on the node you want to remove rather than removenode from another node (and definitely don't force removal) native_transport_max_concurrent_requests_in_bytes defaults to 10% of the heap, which I suppose depending on

Re: Node removal causes spike in pending native-transport requests and clients suffer

2021-03-11 Thread Kane Wilson
nodes (few mb's compared to the default 0.8 of a 60gb heap), it > looks like it's good enough for the app, will roll it out to the entire dc > and test removal again. > > > On Tue, Mar 9, 2021 at 10:51 AM Kane Wilson wrote: > >> It's unlikely to help in this

Re: Restore of system_auth data to new cluster

2021-03-15 Thread Kane Wilson
Keep in mind that you'll need the same tokens for each node for your restore to work if RF < #Nodes. There is an easy way to work around this though by setting RF=# of nodes on the system_auth keyspace (and do a repair of it) before you take the backup, then restore system_auth to every node. If yo

Re: No node was available to execute query error

2021-03-17 Thread Kane Wilson
I would avoid secondary indexes unless absolutely necessary. It will be much better to have an efficient partition key + clustering key, as for any given partition C* will know exactly what nodes to contact, whereas this may not be the case with secondary indexes and you'll still likely need to tal

Re: Fatal Java error when starting cassandra

2021-03-18 Thread Kane Wilson
Cassandra is not tested on Windows and isn't officially supported or widely used in that environment, so you're unlikely to find much help. This seems to be file system related, which could be a change in Windows or it could be a result of copying? Were there symlinks or directory junctions or some

Re: Changing num_tokens and migrating to 4.0

2021-03-21 Thread Kane Wilson
You should be able to get repairs working fine if you use a tool such as cassandra-reaper to manage it for you for such a small cluster. I would look into that before doing major cluster topology changes, as these can be complex and risky. I definitely wouldn't go about it in the way you've describ

Re: Best strategy to run repair

2021-03-22 Thread Kane Wilson
-pr on all nodes takes much longer as you'll do at least triple the amount of merkle calculations I believe (with RF 3) and tends to be quite problematic. Subrange is the way to go, which is what cassandra-reaper will do for you if you have it set up. raft.so - Cassandra consulting, support, and

Re: Dont want to split sstables for repaired and non repaired while repairing with -pr option

2021-03-24 Thread Kane Wilson
Yes you should avoid doing incremental repairs. Either use nodetool repair -full or ideally use subrange repair (-st and -et). Probably want to look into cassandra-reaper as the commonly accepted repair management tool. It performs subrange repairs. Cheers, Kane raft.so - Cassandra consulting, s

Re: Repair on a slow node (or is it?)

2021-03-29 Thread Kane Wilson
Check what your compactionthroughput is set to, as it will impact the validation compactions. also what kind of disks does the DR node have? The validation compaction sizes are likely fine, I'm not sure of the exact details but it's normal to expect very large validations. Rebuilding would not be

Re: Memory Map settings for Cassandra

2021-04-15 Thread Kane Wilson
Cassandra mmaps SSTables into memory, of which there can be many files (including all their indexes and what not). Typically it'll do so greedily until you run out of RAM. 65k map areas tends to be quite low and can easily be exceeded - you'd likely need very low density nodes to avoid going over 6

Re: Memory Map settings for Cassandra

2021-04-15 Thread Kane Wilson
index_only" ? does this hold true even for higher workloads with >> larger datasets like ~1TB per node? >> >> On Thu, Apr 15, 2021 at 4:43 PM Jeff Jirsa wrote: >> >>> disk_acces_mode = mmap_index_only to use fewer maps (or disable it >>> en

Re: Log Rotation of Extended Compaction Logging

2021-04-15 Thread Kane Wilson
Correct. It's also worth noting that if you delete log files and restart C* the CompactionLogger will then find the earliest available file number starting from 0. You'll have to explore what you can use to configure proper log rotation as the CompactionLogger doesn't use the logging system to writ

Re: Huge single-node DCs (?)

2021-04-15 Thread Kane Wilson
4.0 has gone a ways to enable better densification of nodes, but it wasn't a main focus. We're probably still only thinking that 4TB - 8TB nodes will be feasible (and then maybe only for expert users). The main problems tend to be streaming, compaction, and repairs when it comes to dense nodes. Eb

Re: counter cache loading very slow

2021-04-26 Thread Kane Wilson
Sounds like you're potentially hitting a bug, maybe even one that hasn't been hit before. How are you determining it's counters that are the problem? Is it stalling on the Initializing counters log line or something? raft.so - Cassandra consulting, support, and managed services On Mon, Apr 26, 2

Re: Cassandra 4.0 and python

2021-04-28 Thread Kane Wilson
No, I suspect the deb package dependencies haven't been updated correctly, as 2.7 should definitely still work. Could you raise a JIRA for this issue? Not sure if apt has some way to force install/ignore dependencies, however if you do that it may work, otherwise your only workaround would be to i

Re: tablehistogram shows high sstables

2021-04-29 Thread Kane Wilson
It does imply the SSTables are being read - how big is your data size and how much memory on the nodes? It's certainly possible to get low latencies despite many SSTables, but I'd expect small read sizes paired with a lot of memory. raft.so - Cassandra consulting, support, managed services On Th

Re: How to make Cassandra flush CommitLog files more frequently?

2021-05-03 Thread Kane Wilson
(removing dev) commitlog_segment_size_in_mb isn't going to help, in fact you probably don't want to modify this as it'll reduce the maximum size of your mutations. Reducing the total space on its own will help, however definitely test this as such a large drop could result in a massive increase in

Re: How to make Cassandra flush CommitLog files more frequently?

2021-05-05 Thread Kane Wilson
is used for streaming data changes in > Cassandra with low latency. If this is not the case, may I understand > what's the purpose and the intended use case for the CDC feature in > Cassandra please? > > Thank you so much! > Bingqin Zhou > > On Mon, May 3, 2021 at 5:00

Re: Getting error: "no connections were made when creating the session"

2021-05-06 Thread Kane Wilson
Neither of those changes should have had any effect on client connections, as they only relate to internode communication. Your go client should have specified the public ip addresses as per the broadcast RPC address, and would need to be specifying the live node as the connection point to work. I

Re: Getting error: "no connections were made when creating the session"

2021-05-06 Thread Kane Wilson
On Fri, May 7, 2021 at 1:00 PM MyWorld wrote: > Thanks Kane. But go client connection settings contain only the public > address of a single live node. Other nodes were already removed from the > connection string. Moreover, we get the server reboot in order to clear any > cache. Still the error

Re: RC1 - joining cluster

2021-05-09 Thread Kane Wilson
How long are you waiting for the node to join? Have you checked nodetool netstats and compactionstats to see if all streams/compactions are complete? raft.so - Cassandra consulting, support, and managed services On Sat, May 8, 2021 at 11:23 AM Joe Obernberger < joseph.obernber...@gmail.com> wrot

Re: Counter errors - RC1

2021-05-10 Thread Kane Wilson
Seems like some of your nodes are overloaded. Is it intentional that some of your nodes have varying numbers of tokens? It seems like some of your nodes are overloaded, potentially at least #RF of them. If nodes are heavily overloaded GC tuning generally won't help much, you're best off starting b

Re: RC1 - joining cluster

2021-05-10 Thread Kane Wilson
t; to false on both new machines and they joined. Then ran a repair. > > -Joe > On 5/9/2021 7:12 PM, Kane Wilson wrote: > > How long are you waiting for the node to join? Have you checked nodetool > netstats and compactionstats to see if all streams/compactions are complete? > &g

Re: Suggestions on Running UpgradeSSTables

2021-05-19 Thread Kane Wilson
On Thu, May 20, 2021 at 11:17 AM Jai Bheemsen Rao Dhanwada < jaibheem...@gmail.com> wrote: > Thanks for the response, > > Is there a limit on how long I can run in mixed mode? Let's say if > datacenter 1 is upgraded and upgradesstables was run on day 1 and > datacenter 3 is upgraded and upgradesst

Re: RC1 - joining cluster

2021-05-20 Thread Kane Wilson
at > org.apache.cassandra.streaming.StreamSession.closeSession(StreamSession.java:506) > at > org.apache.cassandra.streaming.StreamSession.complete(StreamSession.java:837) > at > org.apache.cassandra.streaming.StreamSession.messageReceived(StreamSession.java:596)

Re: Cassandra 4.0 GA

2021-05-26 Thread Kane Wilson
On Tue, May 25, 2021 at 6:36 AM Jai Bheemsen Rao Dhanwada < jaibheem...@gmail.com> wrote: > Hello All, > > I see that Cassandra 4.0 RC1 is released in April, is there going to be an > official 4.0 GA release or is RC1 considered as an official GA release with > Production use? If not is there a te

Re: unable to repair

2021-05-26 Thread Kane Wilson
> > I have had that error sometimes when schema mismatch but also when all > schema match. So I think this is not the only cause. > Have you checked the logs for errors on 135.181.222.100, 135.181.217.109, and 135.181.221.180? They may give you some better information about why they are sending bad

Re: unable to repair

2021-05-27 Thread Kane Wilson
> > Which client operations could trigger schema change at node level? Do you > mean that for ex creating a new table trigger a schema change globally, not > only at KS/table single level? > Yes, any DDL statement (creating tables, altering, dropping, etc) triggers a schema change across the cluste

Re: Hints not being created

2021-05-27 Thread Kane Wilson
Hey > I have a cluster, 3.11.9, in which I am enabling hints, but it appears > that no hints are created when other nodes are down. > I can see that hints are enabled by running "nodetool statushandoff", and > gc_grace_seconds is high enough on all tables, so I'm expecting to see > hints being crea

Re: TWCS repair and compact help

2021-06-29 Thread Kane Wilson
> > Oh. So our data is all messed up now because of the “nodetool compact” I > ran. > > > > Hi Erick. Thanks for the quick reply. > > > > I just want to be sure about compact. I saw Cassandra will do compaction > by itself even when I do not run “nodetool compact” manually (nodetool > compaction

Re: Soon After Starting c* Process: CPU 100% for java Process

2021-06-30 Thread Kane Wilson
Looks like it's doing a lot of reads immediately on startup (AbstractQueryPager) which is potentially causing a lot of GC (guessing that's what caused the StatusLogger). DEBUG [SharedPool-Worker-113] 2021-06-30 13:39:04,766 AbstractQueryPager.java:133 - Remaining rows to page: 2147483646 is quite

Re: Soon After Starting c* Process: CPU 100% for java Process

2021-07-01 Thread Kane Wilson
> > Eventually, the shared pool worker crashes > -- > b-3223-big-Data.db > WARN [SharedPool-Worker-55] 2021-06-30 19:55:41,677 > AbstractLocalAwareExecutorService.java:169 - Uncaught exception on thread > Thread[SharedPool-Worker-55,5,mai

Re: How to remove tombstones in a levelled compaction table in Cassandra 2.1.16?

2021-07-05 Thread Kane Wilson
In one of our LCS table auto compaction was disabled. Now after years of > run, range queries using spark-cassandra-connector are failing. Cassandra > version is 2.1.16. > > I suspect due to disabling of autocompaction lots of tombstones got > created. And now while reading those are creating issue