Re: Big Data Question

2023-08-21 Thread Jeff Jirsa
(Yes, just somewhat less likely to be the same order of speed-up in STCS where sstables are more likely to cross token boundaries, modulo some stuff around sstable splitting at token ranges a la 6696) On Mon, Aug 21, 2023 at 11:35 AM Dinesh Joshi wrote: > Minor correction, zero copy streaming ak

Re: Big Data Question

2023-08-21 Thread Dinesh Joshi
Minor correction, zero copy streaming aka faster streaming also works for STCS.DineshOn Aug 21, 2023, at 8:01 AM, Jeff Jirsa wrote:There's a lot of questionable advice scattered in this thread. Set aside most of the guidance like 2TB/node, it's old and super nuanced.If you're bare metal, do what

Re: Big Data Question

2023-08-21 Thread daemeon reiydelle
- k8s 1. depending on the version and networking, number of containers per node, nodepooling, etc. you can expect to see 1-2% additional storage IO latency (depends on whether all are on the same network vs. separate storage IO TCP network) 2. System overhead may be 3-15% depending

Re: Big Data Question

2023-08-21 Thread Patrick McFadin
...and a shameless plug for the Cassandra Summit in December. We have a talk from somebody that is doing 70TB per node and will be digging into all the aspects that make that work for them. I hope everyone in this thread is at that talk! I can't wait to hear all the questions. Patrick On Mon, Aug

Re: Big Data Question

2023-08-21 Thread Jeff Jirsa
There's a lot of questionable advice scattered in this thread. Set aside most of the guidance like 2TB/node, it's old and super nuanced. If you're bare metal, do what your organization is good at. If you have millions of dollars in SAN equipment and you know how SANs work and fail and get backed u

Re: Big Data Question

2023-08-21 Thread Joe Obernberger
For our scenario, the goal is to minimize down-time for a single (at least initially) data center system.  Data-loss is basically unacceptable.  I wouldn't say we have a "rusty slow data center" - we can certainly use SSDs and have servers connected via 10G copper to a fast back-plane.  For our

RE: Big Data Question

2023-08-18 Thread Durity, Sean R via user
, even in a single data center scenario. Otherwise, there are other data options. Sean R. Durity DB Solutions Staff Systems Engineer – Cassandra INTERNAL USE From: daemeon reiydelle Sent: Thursday, August 17, 2023 7:38 PM To: user@cassandra.apache.org Subject: [EXTERNAL] Re: Big Data Question I

Re: Big Data Question

2023-08-17 Thread daemeon reiydelle
I started to respond, then realized I and the other OP posters are not thinking the same: What is the business case for availability, data los/reload/recoverability? You all argue for higher availability and damn the cost. But noone asked "can you lose access, for 20 minutes, to a portion of the da

Re: Big Data Question

2023-08-17 Thread Joe Obernberger
Was assuming reaper did incremental?  That was probably a bad assumption. nodetool repair -pr I know it well now! :) -Joe On 8/17/2023 4:47 PM, Bowen Song via user wrote: I don't have experience with Cassandra on Kubernetes, so I can't comment on that. For repairs, may I interest you with i

Re: Big Data Question

2023-08-17 Thread Bowen Song via user
I don't have experience with Cassandra on Kubernetes, so I can't comment on that. For repairs, may I interest you with incremental repairs? It will make repairs hell of a lot faster. Of course, occasional full repair is still needed, but that's another story. On 17/08/2023 21:36, Joe Obernb

Re: Big Data Question

2023-08-17 Thread Joe Obernberger
Thank you.  Enjoying this conversation. Agree on blade servers, where each blade has a small number of SSDs.  Yeh/Nah to a kubernetes approach assuming fast persistent storage?  I think that might be easier to manage. In my current benchmarks, the performance is excellent, but the repairs are

Re: Big Data Question

2023-08-17 Thread Bowen Song via user
From my experience, that's not entirely true. For large nodes, the bottleneck is usually the JVM garbage collector. The the GC pauses can easily get out of control on very large heaps, and long STW pauses may also result in nodes flip up and down from other nodes' perspective, which often rende

Re: Big Data Question

2023-08-17 Thread daemeon reiydelle
A lot of (actually all) seem to be based on local nodes with 1gb networks of spinning rust. Much of what is mentioned below is TOTALLY wrong for cloud. So clarify whether you are "real world" or rusty slow data center world (definitely not modern DC either). E.g. should not handle more than 2tb of

Re: Big Data Question

2023-08-17 Thread Bowen Song via user
The optimal node size largely depends on the table schema and read/write pattern. In some cases 500 GB per node is too large, but in some other cases 10TB per node works totally fine. It's hard to estimate that without benchmarking. Again, just pointing out the obvious, you did not count the o

RE: Big Data Question

2023-08-17 Thread Durity, Sean R via user
. Sean R. Durity INTERNAL USE From: Joe Obernberger Sent: Thursday, August 17, 2023 10:46 AM To: user@cassandra.apache.org Subject: [EXTERNAL] Re: Big Data Question Thanks for this - yeah - duh - forgot about replication in my example! So - is 2TBytes per Cassandra instance advisable? Better to

Re: Big Data Question

2023-08-17 Thread C. Scott Andreas
A few thoughts on this:– 80TB per machine is pretty dense. Consider the amount of data you'd need to re-replicate in the event of a hardware failure that takes down all 80TB (DIMM failure requiring replacement, non-reduntant PSU failure, NIC, etc).– 24GB of heap is also pretty generous. Dependi

Re: Big Data Question

2023-08-17 Thread Joe Obernberger
Thanks for this - yeah - duh - forgot about replication in my example! So - is 2TBytes per Cassandra instance advisable?  Better to use more/less?  Modern 2u servers can be had with 24 3.8TBtyte SSDs; so assume 80Tbytes per server, you could do: (1024*3)/80 = 39 servers, but you'd have to run 40

Re: Big Data Question

2023-08-17 Thread Bowen Song via user
Just pointing out the obvious, for 1PB of data on nodes with 2TB disk each, you will need far more than 500 nodes. 1, it is unwise to run Cassandra with replication factor 1. It usually makes sense to use RF=3, so 1PB data will cost 3PB of storage space, minimal of 1500 such nodes. 2, depend

Re: Big Data Question

2023-08-16 Thread Jeff Jirsa
A lot of things depend on actual cluster config - compaction settings (LCS vs STCS vs TWCS) and token allocation (single token, vnodes, etc) matter a ton. With 4.0 and LCS, streaming for replacement is MUCH faster, so much so that most people should be fine with 4-8TB/node, because the rebuild tim