Thanks Jeff for your response.
Do you see any risk in following approach
1. Stop the node.
2. Remove all sstable files from
*/var/lib/cassandra/data/keyspace/tablename-23dfadf32adf33d33s333s33s3s33 *
directory.
3. Start the node.
4. Run full repair on this particular table
I wanted to go thi
Agree this is both strictly possible and more common with LCS. The only
thing that's strictly correct to do is treat every corrupt sstable
exception as a failed host, and replace it just like you would a failed
host.
On Thu, Feb 13, 2020 at 10:55 PM manish khandelwal <
manishkhandelwa...@gmail.co
Thanks Erick
I would like to explain how data resurrection can take place with single
SSTable deletion.
Consider this case of table with Levelled Compaction Strategy
1. Data A written a long time back.
2. Data A is deleted and tombstone is created.
3. After GC grace tombstone is purgeable.
4. No
The log shows that the the problem occurs when decompressing the SSTable
but there's not much actionable info from it.
I would like to know what will be "ordinary hammer" in this case. Do you
> want to suggest that deleting only corrupt sstable file ( in this case
> mc-1234-big-*.db) would be su
Hi Erick
Thanks for your quick response. I have attached the full stacktrace which
show exception during validation phase of table repair.
I would like to know what will be "ordinary hammer" in this case. Do you
want to suggest that deleting only corrupt sstable file ( in this case
*mc-1234-big
Feels that way and most people don’t do it, but definitely required for strict
correctness.
> On Feb 13, 2020, at 8:57 PM, Erick Ramirez wrote:
>
>
> Interesting... though it feels a bit extreme unless you're dealing with a
> cluster that's constantly dropping mutations. In which case, you
It will achieve the outcome you are after but I doubt anyone would
recommend that approach. It's like using a sledgehammer when an ordinary
hammer would suffice. And if you were hitting some bug then you'd run into
the same problem anyway.
Can you post the full stack trace? It might provide us som
Hi Eric
Thanks for reply.
Reason for corruption is unknown to me. I just found the corrupt table when
scheduled repair failed with logs showing
*ERROR [ValidationExecutor:16] 2020-01-21 19:13:18,123
CassandraDaemon.java:228 - Exception in thread
Thread[ValidationExecutor:16,1,main]org.apach
Interesting... though it feels a bit extreme unless you're dealing with a
cluster that's constantly dropping mutations. In which case, you have
bigger problems anyway. :)
Option 1 is only strictly safe if you run repair while the down replica is
down (otherwise you validate quorum consistency guarantees)
Option 2 is probably easier to manage and wont require any special effort
to avoid violating consistency.
I'd probably go with option 2.
On Thu, Feb 13, 2020 at
Thank you for the advices!
Best!
Sergio
On Thu, Feb 13, 2020, 7:44 PM Erick Ramirez
wrote:
> Option 1 is a cheaper option because the cluster doesn't need to rebalance
> (with the loss of a replica) post-decommission then rebalance again when
> you add a new node.
>
> The hints directory on EB
Not a problem. And I've just responded on the new thread. Cheers! 👍
>
Option 1 is a cheaper option because the cluster doesn't need to rebalance
(with the loss of a replica) post-decommission then rebalance again when
you add a new node.
The hints directory on EBS is irrelevant because it would only contain
mutations to replay to down replicas if the node was a coor
Thank you very much for this helpful information!
I opened a new thread for the other question :)
Sergio
Il giorno gio 13 feb 2020 alle ore 19:22 Erick Ramirez <
erick.rami...@datastax.com> ha scritto:
> I want to have more than one seed node in each DC, so unless I don't
>> restart the node af
>
> I want to have more than one seed node in each DC, so unless I don't
> restart the node after changing the seed_list in that node it will not
> become the seed.
That's not really going to hurt you if you have other seeds in other DCs.
But if you're willing to take the hit from the restart the
We have i3xlarge instances with data directory in the XFS filesystem that
is ephemeral and *hints*, *commit_log* and *saved_caches* in the EBS
volume.
Whenever AWS is going to retire the instance due to degraded hardware
performance is it better:
Option 1)
- Nodetool drain
- Stop cassandra
Right now yes I have one seed per DC.
I want to have more than one seed node in each DC, so unless I don't
restart the node after changing the seed_list in that node it will not
become the seed.
Do I need to update the seed_list across all the nodes even in separate DCs
and perform a rolling rest
>
> 1) If I don't restart the node after changing the seed list this will
> never become the seed and I would like to be sure that I don't find my self
> in a spot where I don't have seed nodes and this means that I can not add a
> node in the cluster
Are you saying you only have 1 seed node in t
Thank you very much for your response!
2 things:
1) If I don't restart the node after changing the seed list this will never
become the seed and I would like to be sure that I don't find my self in a
spot where I don't have seed nodes and this means that I can not add a node
in the cluster
2) We
>
> I did decommission of this node and I did all the steps mentioned except
> the -Dcassandra.replace_address and now it is streaming correctly!
That works too but I was trying to avoid the rebalance operations (like
streaming to restore replica counts) since they can be expensive.
So basically
I did decommission of this node and I did all the steps mentioned except
the -Dcassandra.replace_address and now it is streaming correctly!
So basically, if I want this new node as seed should I add its IP address
after it joined the cluster and after
- nodetool drain
- restart cassandra?
I deact
>
> Should I do something to fix it or leave as it?
It depends on what your intentions are. I would use the "replace" method to
build it correctly. At a high level:
- remove the IP from it's own seeds list
- delete the contents of data, commitlog and saved_caches
- add the replace flag in cassand
Thanks for your fast reply!
No repairs are running!
https://cassandra.apache.org/doc/latest/faq/index.html#does-single-seed-mean-single-point-of-failure
I added the node IP itself and the IP of existing seeds and I started
Cassandra.
So the right procedure is not to add in the seed list the new
>
> I wanted to add a new node in the cluster and it looks to be working fine
> but instead to wait for 2-3 hours data streaming like 100GB it immediately
> went to the UN (UP and NORMAL) state.
>
Are you running a repair? I can't see how it's possibly receiving 100GB
since it won't bootstrap.
Should I do something to fix it or leave as it?
On Thu, Feb 13, 2020, 5:29 PM Jon Haddad wrote:
> Seeds don't bootstrap, don't list new nodes as seeds.
>
> On Thu, Feb 13, 2020 at 5:23 PM Sergio wrote:
>
>> Hi guys!
>>
>> I don't know how but this is the first time that I see such behavior. I
>
Seeds don't bootstrap, don't list new nodes as seeds.
On Thu, Feb 13, 2020 at 5:23 PM Sergio wrote:
> Hi guys!
>
> I don't know how but this is the first time that I see such behavior. I
> wanted to add a new node in the cluster and it looks to be working fine but
> instead to wait for 2-3 hours
Hi guys!
I don't know how but this is the first time that I see such behavior. I
wanted to add a new node in the cluster and it looks to be working fine but
instead to wait for 2-3 hours data streaming like 100GB it immediately went
to the UN (UP and NORMAL) state.
I saw a bunch of exception in t
Paul, if you do a sstabledump in C* 3.0 (before upgrading) and compare it
to the dump output after upgrading to C* 3.11 then you will see that the
cell names in the outputs are different. This is the symptom of the broken
serialization header which leads to various exceptions during compactions
and
You need to stop C* in order to run the offline sstable scrub utility.
That's why it's referred to as "offline". :)
Do you have any idea on what caused the corruption? It's highly unusual
that you're thinking of removing all the files for just one table.
Typically if the corruption was a result of
- Verify that nodetool upgradesstables has completed successfully on all
nodes from any previous upgrade
- Turn off repairs and any other streaming operations (add/remove nodes)
- Nodetool drain on the node that needs to be stopped (seeds first,
preferably)
- Stop an un-upgraded n
Hi
I see a corrupt SSTable in one of my keyspace table on one node. Cluster is
3 nodes with replication 3. Cassandra version is 3.11.2.
I am thinking on following lines to resolve the corrupt SSTable issue.
1. Run nodetool scrub.
2. If step 1 fails, run offline sstabablescrub.
3. If step 2 fails,
thank you
On Thu, Feb 13, 2020 at 6:30 AM Durity, Sean R
wrote:
> I will just add-on that I usually reserve security changes as the primary
> exception where app downtime may be necessary with Cassandra. (DSE has some
> Transitional tools that are useful, though.) Sometimes a short outage is
> p
Since ping is ICMP, not TCP, you probably want to investigate a mix of TCP and
CPU stats to see what is behind the slow pings. I’d guess you are getting
network impacts beyond what the ping times are hinting at. ICMP isn’t subject
to retransmission, so your TCP situation could be far worse than
Hi all,
I have looked at the release notes for the up coming release 3.11.6 and seen
the part about corruption of frozen UDT types during upgrade from 3.0.
We have a number of cluster using UDT and have been upgrading to 3.11.4 and
haven’t noticed any problems.
In the ticket ( CASSANDRA-15035
+1 on nodetool drain. I added that to our upgrade automation and it really
helps with post-upgrade start-up time.
Sean Durity
From: Erick Ramirez
Sent: Wednesday, February 12, 2020 10:29 PM
To: user@cassandra.apache.org
Subject: Re: [EXTERNAL] Cassandra 3.11.X upgrades
Yes to the steps. The on
I will just add-on that I usually reserve security changes as the primary
exception where app downtime may be necessary with Cassandra. (DSE has some
Transitional tools that are useful, though.) Sometimes a short outage is
preferred over a longer, more-complicated attempt to keep the app up. And
>
> Last question: In all your experiences, how high can the latency (simple
> ping response times go) before it becomes a problem? (Obviously the lower
> the better but is there some sort of cut off/formula where problems can be
> expected intermittently like the connection resets)
Unfortunately
37 matches
Mail list logo