Re: Cassandra upgrade from 2.2.8 to 3.10

2018-03-28 Thread Fred Habash
DC with > version 3.10 installed, will nodes in DC3 join the cluster with data > without issues? > > > > Thanks/Asad > > > > > > > > > -- Thank you ... Fred Habash, Database Solutions Architect (Oracle OCP 8i,9i,10g,11g)

Read Latency Doubles After Shrinking Cluster and Never Recovers

2018-06-11 Thread Fred Habash
I have hit dead-ends every where I turned on this issue. We had a 15-node cluster that was doing 35 ms all along for years. At some point, we made a decision to shrink it to 13. Read latency rose to near 70 ms. Shortly after, we decided this was not acceptable, so we added the three nodes back in

Long-running Job to Extract Data Timesout After ~ 60 Hours

2018-10-03 Thread Fred Habash
We tried to extract large volume of data from a 42-node cluster about three times and in all attempts, client sessions aborts after ~ 60 hours. Here's what we see in in the client logs I have reviewed the multiple timeout settings in C*, but none seemed to relate to the 60 hrs limit. What is pot

Timestamp of Last Repair

2018-12-11 Thread Fred Habash
We are trying to detect a scenario where some of our smaller clusters go un-repaired for extended periods of times mostly due to defects in deployment pipelines or human errors. We would like to automate a check for clusters where nodes that go un-repaired for more than 7 days, to shoot out an exc

Re: Timestamp of Last Repair

2018-12-18 Thread Fred Habash
4119c9d8] > new session: > RepairSession.java (line 282) [repair #2e7009b0-c03d-11e4-9012-99a64119c9d8] > session completed successfully > > 2. In table you can check: started_at and finished_at field in > system_distributed.parent_repair_history > > regards, > Laxmikant

Re: Bootstrapping to Replace a Dead Node vs. Adding a New Node: Consistency Guarantees

2019-05-01 Thread Fred Habash
Thank you. Range movement is one reason this is enforced when adding a new node. But, what about forcing a consistent bootstrap i.e. bootstrapping from primary owner of the range and not a secondary replica. How’s consistent bootstrap enforced when replacing a dead node. - Thank you.

Re: Bootstrapping to Replace a Dead Node vs. Adding a New Node: Consistency Guarantees

2019-05-01 Thread Fred Habash
I, probably, should've been clearer in my inquiry ... I'm investigating a scenario where our diagnostic data is tell us that a small portion of application data has been lost. I mean, getsstables for the keys returns zero on all cluster nodes. The last pickle article below (which includes a case

Re: CL=LQ, RF=3: Can a Write be Lost If Two Nodes ACK'ing it Die

2019-05-03 Thread Fred Habash
Thank you all. So, please, bear with me for a second. I'm trying to figure out how can data be totally lost under the above circumstances when nodes die in two out of three racks. You stated "the replica may or many not have made its way to the third node '. Why 'may not'? This is what I ca

Re: When Replacing a Node, How to Force a Consistent Bootstrap

2017-12-05 Thread Fred Habash
Or, do a full repair after bootstrapping completes? On Dec 5, 2017 4:43 PM, "Jeff Jirsa" wrote: > You cant ask cassandra to stream from the node with the "most recent > data", because for some rows B may be most recent, and for others C may be > most recent - you'd have to stream from both (wh

Re: Cluster Repairs 'nodetool repair -pr' Cause Severe Increase in Read Latency After Shrinking Cluster

2018-02-21 Thread Fred Habash
One node at a time On Feb 21, 2018 10:23 AM, "Carl Mueller" wrote: > What is your replication factor? > Single datacenter, three availability zones, is that right? > You removed one node at a time or three at once? > > On Wed, Feb 21, 2018 at 10:20 AM, Fd Habash wrote: > >> We have had a 15 nod

Re: Cluster Repairs 'nodetool repair -pr' Cause Severe Increase in Read Latency After Shrinking Cluster

2018-02-21 Thread Fred Habash
RF of 3 with three racs AZ's in a single region. On Feb 21, 2018 10:23 AM, "Carl Mueller" wrote: > What is your replication factor? > Single datacenter, three availability zones, is that right? > You removed one node at a time or three at once? > > On Wed, Feb 21, 2018 at 10:20 AM, Fd Habash wr

Re: Cluster Repairs 'nodetool repair -pr' Cause Severe Increase in Read Latency After Shrinking Cluster

2018-02-23 Thread Fred Habash
. On Feb 21, 2018 1:29 PM, "Fred Habash" wrote: > One node at a time > > On Feb 21, 2018 10:23 AM, "Carl Mueller" > wrote: > >> What is your replication factor? >> Single datacenter, three availability zones, is that right? >> You removed o

Predicting Read/Write Latency as a Function of Total Requests & Cluster Size

2019-12-10 Thread Fred Habash
I'm looking for an empirical way to answer these two question: 1. If I increase application work load (read/write requests) by some percentage, how is it going to affect read/write latency. Of course, all other factors remaining constant e.g. ec2 instance class, ssd specs, number of nodes, etc. 2

Measuring Cassandra Metrics at a Sessions/Connection Levels

2019-12-12 Thread Fred Habash
Hi all ... We are facing a scenario where we have to measure for some metrics on a per connection or client basis. For example. count of read/write request by client IP/host/user/program. We want to know the source of C* requests for budgeting, capacity planing, or charge-backs. We are running 2.2

Soon After Starting c* Process: CPU 100% for java Process

2021-06-30 Thread Fred Habash
I have node in cluster when I start c, the cpu reaches 100% with java process on top. Within a few minutes, jvm crashes (jvm instability) messages in system.log and c* crashes. Once c* is up, cluster average read latency reaches multi-seconds and client apps are unhappy. For now, the only way out

Re: Soon After Starting c* Process: CPU 100% for java Process

2021-07-01 Thread Fred Habash
to get a better idea at what's happening at startup. > > On Thu, Jul 1, 2021 at 5:14 AM Fred Habash wrote: > >> I have node in cluster when I start c, the cpu reaches 100% with java >> process on top. Within a few minutes, jvm crashes (jvm instability) >> messages in

C* Jave Clients Gettting 'java.lang.IllegalStateException: Queue full'

2023-06-16 Thread Fred Habash
A java service client app reported getting this error message. Initially, I thought of it as a C* emitting the error back to the client. But searching the C* logs (system/gc/debug) for 'queue full' or some variation of it returned zero instances. I have seen some log snippets on the web where [Cass

Re: C* Jave Clients Gettting 'java.lang.IllegalStateException: Queue full'

2023-06-19 Thread Fred Habash
Just wondering if my inquiry requires further details to warrant some interest. Hope someone else out there has had a similar experience. On Fri, Jun 16, 2023 at 2:55 PM Fred Habash wrote: > A java service client app reported getting this error message. Initially, > I thought of it

Cassandra ARM Support: What Version & What Download Links?

2025-03-11 Thread Fred Habash
Trying to understand when Apache Cassandra started supporting ARM-64 architecture. Specifically, AWS Graviton. I have found multiple documentation comparing C* performance on Intel vs. ARM. But, without version details. My understanding is that to run C* on ARM-64, we must use C* >= 4.0. Furthermo

Re: Cassandra ARM Support: What Version & What Download Links?

2025-03-12 Thread Fred Habash
Any confirmation or feedback will be appreciated. Thanks On Tue, Mar 11, 2025 at 3:07 PM Fred Habash wrote: > Trying to understand when Apache Cassandra started supporting ARM-64 > architecture. Specifically, AWS Graviton. I have found multiple > documentation comparing C* performance