Re: removenode force vs assasinate

2019-03-11 Thread onmstester onmstester
You should first try with removenode which triggers cluster streaming, if removenode failes or stuck, Assassinate is the last solution. Sent using https://www.zoho.com/mail/ On Mon, 11 Mar 2019 14:27:13 +0330 Ahmed Eljami wrote Hello, Can someone explain me the differenc

Re: removenode force vs assasinate

2019-03-11 Thread onmstester onmstester
The only option to stream decommissioned node's data is to run "nodetool decommission" on the decommissioned node (while cassandra is running on the node) removenode only streams data from node's relpica, so any data that only stored on decommissioned node would be lost. You should monitoring

can i delete a sstable with Estimated droppable tombstones > 1, manually?

2019-03-19 Thread onmstester onmstester
Running: SSTablemetadata /THE_KEYSPACE_DIR/mc-1421-big-Data.db result was: Estimated droppable tombstones: 1.2 Having STCS and data disk usage of 80% (do not have enough free space for normal compaction), Is it OK to just: 1. stop Cassandra, 2. delete mc-1421* and then 3. start Cassandra?

Re: gc_grace config for time serie database

2019-04-17 Thread onmstester onmstester
I do not use table default ttl (every row has its own TTL) and also no update occurs to the rows. I suppose that (because of immutable nature of everything in cassandra) cassandra would keep only the insertion timestamp + the original ttl and  computes ttl of a row using these two and current

when the "delete statement" would be deleted?

2019-04-24 Thread onmstester onmstester
Just deleted multiple partitions from one of my tables, dumping sstables shows that the data successfully deleted, but the 'marked_deleted' rows for each of partitions still exists on sstable and allocates storage. Is there any way to get rid of these delete statements storage overhead (everyt

Re: when the "delete statement" would be deleted?

2019-04-24 Thread onmstester onmstester
Found the answer: it would be deleted after gc_grace Just decreased the gc_grace, run compact, and the "marked_deleted" partitions purged from sstable Sent using https://www.zoho.com/mail/ On Wed, 24 Apr 2019 14:15:33 +0430 onmstester onmstester wrote Just delete

How to set up a cluster with allocate_tokens_for_keyspace?

2019-05-04 Thread onmstester onmstester
I just read this article by tlp: https://thelastpickle.com/blog/2019/02/21/set-up-a-cluster-with-even-token-distribution.html   Noticed that: >>We will need to set the tokens for the seed nodes in each rack manually. This >>is to prevent each node from randomly calculating its own token ranges

Fwd: Re: How to set up a cluster with allocate_tokens_for_keyspace?

2019-05-04 Thread onmstester onmstester
't need to specify tokens anymore, you can just use allocate_tokens_for_keyspace. On Sat, May 4, 2019 at 2:14 AM onmstester onmstester <mailto:onmstes...@zoho.com.invalid> wrote: > > I just read this article by tlp: > https://thelastpickle.com/blog/2019/02/21/se

Re: How to set up a cluster with allocate_tokens_for_keyspace?

2019-05-05 Thread onmstester onmstester
e vnodes - number of token per node and the number of racks. Regards, Anthony On Sat, 4 May 2019 at 19:14, onmstester onmstester <mailto:onmstes...@zoho.com.invalid> wrote: I just read this article by tlp: https://thelastpickle.com/blog/2019/02/21/set-up-a-cluster-with

Re: Cassandra.Link Knowledge Base - v. 0.4

2019-07-21 Thread onmstester onmstester
Thank you all! Sent using https://www.zoho.com/mail/ On Sat, 20 Jul 2019 16:13:29 +0430 Rahul Singh wrote Hey Cassandra community , Thanks for all the feedback in the past on my cassandra knowledge base project. Without the feedback cycle it’s not really for the community. 

Cassandra cluster: could not reach linear scalability

2018-02-18 Thread onmstester onmstester
I've configured a simple cluster using two PC with identical spec: cpu core i5 RAM: 8GB ddr3 Disk: 1TB 5400rpm Network: 1 G (I've test it with iperf, it really is!) using the common configs described in many sites including datastax itself: cluster_name: 'MyCassandraCluster' num_tokens: 256 se

Cassandra data model too many table

2018-02-18 Thread onmstester onmstester
I have a single structured row as input with rate of 10K per seconds. Each row has 20 columns. Some queries should be answered on these inputs. Because most of queries needs different where, group by or orderby, The final data model ended up like this: primary key for table of query1 : ((colum

Re: Cassandra cluster: could not reach linear scalability

2018-02-18 Thread onmstester onmstester
start. -- Rahul Singh rahul.si...@anant.us Anant Corporation On Feb 18, 2018, 6:29 AM -0500, onmstester onmstester <onmstes...@zoho.com>, wrote: I've configured a simple cluster using two PC with identical spec: cpu core i5 RAM: 8GB ddr3 Disk: 1TB 5400rpm Network: 1 G (I'

Re: Cassandra cluster: could not reach linear scalability

2018-02-18 Thread onmstester onmstester
re chokepoints in the GC cycle. -- Rahul Singh rahul.si...@anant.us Anant Corporation On Feb 18, 2018, 9:23 AM -0500, onmstester onmstester <onmstes...@zoho.com>, wrote: But monitoring cassandra with jmx using jvisualVM shows no problem, less than 30% of heap size u

Re: Cassandra cluster: could not reach linear scalability

2018-02-19 Thread onmstester onmstester
8:29 GMT-03:00 onmstester onmstester <onmstes...@zoho.com>: I've configured a simple cluster using two PC with identical spec: cpu core i5 RAM: 8GB ddr3 Disk: 1TB 5400rpm Network: 1 G (I've test it with iperf, it really is!) using the common configs described in many site

Re: Right sizing Cassandra data nodes

2018-02-23 Thread onmstester onmstester
Another Question on node density, in this scenario: 1. we should keep time series data of some years for a heavy write system in Cassandra (> 10K Ops in seconds) 2. the system is insert only and inserted data would never be updated 3. in partition key, we used number of months since 1970, so da

hardware sizing for insert only scenarios

2018-02-26 Thread onmstester onmstester
Another Question on node density, in this scenario: 1. we should keep time series data of some years for a heavy write system in Cassandra (> 10K Ops in seconds) 2. the system is insert only and inserted data would never be updated 3. in partition key, we used number of months since 1970, so

Cassandra on high performance machine: virtualization vs Docker

2018-02-27 Thread onmstester onmstester
What i've got to set up my Apache Cassandra cluster are some Servers with 20 Core cpu * 2 Threads and 128 GB ram and 8 * 2TB disk. Just read all over the web: Do not use big nodes for your cluster, i'm convinced to run multiple nodes on a single physical server. So the question is which techno

Re: Cassandra on high performance machine: virtualization vs Docker

2018-02-28 Thread onmstester onmstester
o 1.415.501.0198 London 44 020 8144 9872 On Tue, Feb 27, 2018 at 8:26 PM, onmstester onmstester <onmstes...@zoho.com> wrote: What i've got to set up my Apache Cassandra cluster are some Servers with 20 Core cpu * 2 Threads and 128 GB ram and 8 * 2TB disk. Just r

cfhistograms InstanceNotFoundException EstimatePartitionSizeHistogram

2018-03-06 Thread onmstester onmstester
Running this command: nodetools cfhistograms keyspace1 table1 throws this exception in production server: javax.management.InstanceNotFoundException: org.apache.cassandra.metrics:type=Table,keyspace=keyspace1,scope=table1,name=EstimatePartitionSizeHistogram But i have no problem in a test s

data types storage saving

2018-03-06 Thread onmstester onmstester
I'm using int data type for one of my columns but for 99.99...% its data never would be > 65K, Should i change it to smallint (It would save some Gigabytes disks in a few months) or Cassandra Compression would take care of it in storage? What about blob data type ? Isn't better to use it in s

Re: cfhistograms InstanceNotFoundException EstimatePartitionSizeHistogram

2018-03-06 Thread onmstester onmstester
t from my iPhone On Mar 6, 2018, at 3:29 AM, onmstester onmstester <onmstes...@zoho.com> wrote: Running this command: nodetools cfhistograms keyspace1 table1 throws this exception in production server: javax.management.InstanceNotFoundException: org.apache.cassandra.metri

backup/restore cassandra data

2018-03-07 Thread onmstester onmstester
Would it be possible to copy/paste Cassandra data directory from one of nodes (which Its OS partition corrupted) and use it in a fresh Cassandra node? I've used rf=1 so that's my only chance! Sent using Zoho Mail

Re: backup/restore cassandra data

2018-03-08 Thread onmstester onmstester
apshot_restore_t.html#ops_backup_snapshot_restore_t Cheers Ben On Thu, 8 Mar 2018 at 17:07 onmstester onmstester <onmstes...@zoho.com> wrote: -- Ben Slater Chief Product Officer Read our latest technical blog posts here. This email has been sent on behalf of Instaclustr Pty. Limited (Aust

Re: data types storage saving

2018-03-09 Thread onmstester onmstester
arl.muel...@smartthings.com> wrote If you're willing to do the data type conversion in insert and retrieval, the you could use blobs as a sort of "adaptive length int" AFAIK On Tue, Mar 6, 2018 at 6:02 AM, onmstester onmstester <onmstes...@zoho.com> wrote: I&

Re: data types storage saving

2018-03-10 Thread onmstester onmstester
could i calculate disk usage approximately(without inserting actual data)? Sent using Zoho Mail On Sat, 10 Mar 2018 11:21:44 +0330 onmstester onmstester <onmstes...@zoho.com> wrote I've find out that blobs has no gain in storage saving! I had some 16 digit number

yet another benchmark bottleneck

2018-03-11 Thread onmstester onmstester
I'm going to benchmark Cassandra's write throughput on a node with following spec: CPU: 20 Cores Memory: 128 GB (32 GB as Cassandra heap) Disk: 3 seprate disk for OS, data and commitlog Network: 10 Gb (test it with iperf) Os: Ubuntu 16 Running Cassandra-stress: cassandra-stress write n=100

Re: yet another benchmark bottleneck

2018-03-11 Thread onmstester onmstester
e host test? On Sun, Mar 11, 2018 at 10:44 PM, onmstester onmstester <onmstes...@zoho.com> wrote: I'm going to benchmark Cassandra's write throughput on a node with following spec: CPU: 20 Cores Memory: 128 GB (32 GB as Cassandra heap) Disk: 3 seprate disk for OS, data an

Re: yet another benchmark bottleneck

2018-03-12 Thread onmstester onmstester
Mail On Mon, 12 Mar 2018 09:34:26 +0330 onmstester onmstester <onmstes...@zoho.com> wrote Apache-cassandra-3.11.1 Yes, i'm dosing a single host test Sent using Zoho Mail On Mon, 12 Mar 2018 09:24:04 +0330 Jeff Jirsa <jji...@gmail.com> wrote

RE: yet another benchmark bottleneck

2018-03-12 Thread onmstester onmstester
-Henri Berthemet From: onmstester onmstester [mailto:onmstes...@zoho.com] Sent: Monday, March 12, 2018 10:48 AM To: user <user@cassandra.apache.org> Subject: Re: yet another benchmark bottleneck Running two instance of Apache Cassandra on same server, each having their own

RE: yet another benchmark bottleneck

2018-03-12 Thread onmstester onmstester
On Mon, 12 Mar 2018 14:25:12 +0330 Jacques-Henri Berthemet <jacques-henri.berthe...@genesys.com> wrote Any errors/warning in Cassandra logs? What’s your RF? Using 300MB/s of network bandwidth for only 130 op/s looks very high. -- Jacques-Henri Berthemet From:

RE: yet another benchmark bottleneck

2018-03-12 Thread onmstester onmstester
rrent_writes: 32 concurrent_counter_writes: 32 Jumping directly to 160 would be a bit high with spinning disks, maybe start with 64 just to see if it gets better. -- Jacques-Henri Berthemet From: onmstester onmstester [mailto:onmstes...@zoho.com] Sent: Monday, March 12, 2018 12:08 PM

RE: yet another benchmark bottleneck

2018-03-12 Thread onmstester onmstester
on another host? -- Jacques-Henri Berthemet From: onmstester onmstester [mailto:onmstes...@zoho.com] Sent: Monday, March 12, 2018 12:50 PM To: user <user@cassandra.apache.org> Subject: RE: yet another benchmark bottleneck no luck even with 320 threads for write

Re: yet another benchmark bottleneck

2018-03-12 Thread onmstester onmstester
There's few known issues in the write-path at least that prevent scaling with high CPU core count. - Micke On 03/12/2018 03:14 PM, onmstester onmstester wrote: > I mentioned that already tested increasing client threads + many > stress-client instances in one node + two s

Cluster of small clusters

2019-11-16 Thread onmstester onmstester
Each cassandra node creates 6 seperate threads for incomming and outgoing streams to other nodes in the cluster. So with big clusters for example 100 nodes, it would be more than 600 threads running in each Cassandra app, that would cause performance problems, so better have multiple small clu

cassandra collection best practices and performance

2020-01-07 Thread onmstester onmstester
Sweet spot for set and list items count (in datastax's documents, the max is 2billions)? Write and read performance of Set vs List vs simple partition row? Thanks in advance

bug in cluster key push down

2020-01-12 Thread onmstester onmstester
Using Apache Cassandra 3.11.2, defined a table like this: create table my_table(                    partition text,            clustering1 int,       clustering2 text,       data set,     primary key (partition, clustering1, clustering2)) and con

Re: bug in cluster key push down

2020-01-12 Thread onmstester onmstester
  Sent from my iPhone On Jan 12, 2020, at 6:04 AM, onmstester onmstester <mailto:onmstes...@zoho.com.invalid> wrote: Using Apache Cassandra 3.11.2, defined a table like this: create table my_table(                    partition text,            clusterin

Re: bug in cluster key push down

2020-01-12 Thread onmstester onmstester
rows, so i suppose that clustering key restrictions been pushed down to storage engine. Thanks Jeff Sent using https://www.zoho.com/mail/ On Mon, 13 Jan 2020 08:38:44 +0330 onmstester onmstester <mailto:onmstes...@zoho.com.INVALID> wrote Done. https://issues.apache

Fwd: Re: [Discuss] num_tokens default in Cassandra 4.0

2020-02-03 Thread onmstester onmstester
Sorry if its trivial, but i do not understand how num_tokens affects availability, with RF=3, CLW,CLR=quorum, the cluster could tolerate to lost at most one node and all of the tokens assigned to that node would be also assigned to two other nodes no matter what num_tokens is, right? Sent usin

Fwd: Re: [Discuss] num_tokens default in Cassandra 4.0

2020-02-03 Thread onmstester onmstester
out node 1 & 4, then ranges B & L would no longer meet CL=quorum;  but you can do that in the top diagram, since there are no ranges shared between node 1 & 4. Hope that helps. - Max On Feb 3, 2020, at 8:39 pm, onmstester onmstester <mailto:onmstes...@zoho.com.INVALI

Cassandra crashes when using offheap_objects for memtable_allocation_type

2020-06-01 Thread onmstester onmstester
I just changed these properties to increase flushed file size (decrease number of compactions): memtable_allocation_type from heap_buffers to offheap_objects memtable_offheap_space_in_mb: from default (2048) to 8192 Using default value for other memtable/compaction/commitlog configurations .

Re: Running Large Clusters in Production

2020-07-10 Thread onmstester onmstester
Yes, you should handle the routing logic at app level I wish there was another level of sharding (above dc, rack) as cluster to distribute data on multiple cluster! but i don't think there is any other database that does such a thing for you. Another problem with big cluster is for huge amount

Relation between num_tokens and cluster extend limitations

2020-07-13 Thread onmstester onmstester
Hi, I'm using allocate_tokens_for_keyspace and num_tokens=32 and i wan't to extend the size of some clusters. I read in articles that for num_tokens=4, one should add more 25% of cluster size for the cluster to become balanced again. 1. For example, with num_tokens=4 and already have 16 n

Re: design principle to manage roll back

2020-07-14 Thread onmstester onmstester
Hi, I think that Cassandra alone is not suitable for your use case. You can use a mix of Distributed/NoSQL (to storing single records of whatever makes your input the big data) & Relational/Single Database (for transactional non-big data part) Sent using https://www.zoho.com/mail/ O

Multi DCs vs Single DC performance

2020-07-28 Thread onmstester onmstester
Hi, Logically, i do not need to use multiple DCs(cluster is not geographically separated), but i wonder if splitting the cluster to two half (two separate dc) would decrease overhead of node ack/communication and result in better (write) performance? Sent using https://www.zoho.com/mail/

Re: Multi DCs vs Single DC performance

2020-07-28 Thread onmstester onmstester
rding in a way I havent personally figured out yet (maybe if you had a very high replica count per DC, then using forwarding and EACH_QUORUM may get fun, but you'd be better off dropping the replica count than coming up with stuff like this). On Tue, Jul 28, 2020 at 8:27 PM onmstester

streaming stuck on joining a node with TBs of data

2020-07-31 Thread onmstester onmstester
Hi, I'm going to join multiple new nodes to already existed and running cluster. Each node should stream in >2TB of data, and it took a few days (with 500Mb streaming) to almost get finished. But it stuck on streaming-in from one final node, but i can not see any bottleneck on any side (sourc

Re: streaming stuck on joining a node with TBs of data

2020-07-31 Thread onmstester onmstester
No Secondary index, No SASI, No materialized view Sent using https://www.zoho.com/mail/ On Sat, 01 Aug 2020 11:02:54 +0430 Jeff Jirsa wrote Are there secondary indices involved?  On Jul 31, 2020, at 10:51 PM, onmstester onmstester <mailto:onmstes...@zoho.com.inva

Re: streaming stuck on joining a node with TBs of data

2020-08-01 Thread onmstester onmstester
own risk). On Jul 31, 2020, at 11:46 PM, onmstester onmstester <mailto:onmstes...@zoho.com.invalid> wrote: No Secondary index, No SASI, No materialized view Sent using https://www.zoho.com/mail/ On Sat, 01 Aug 2020 11:02:54 +0430 Jeff Jirsa <mailto:jji...@gmail.com>

Fwd: Re: streaming stuck on joining a node with TBs of data

2020-08-03 Thread onmstester onmstester
e any configuration in cassandra to force streamed-in to pass memtable-sstable cycle, to have bigger sstables at first place? Sent using https://www.zoho.com/mail/ Forwarded message From: onmstester onmstester To: "user" Date: Sun, 02 Aug 2020 08:35:3

Re: Re: streaming stuck on joining a node with TBs of data

2020-08-04 Thread onmstester onmstester
e.g. if you're using LCS, change sstable size from 160M to something higher), but there's no magic to join / compact those data files on the sending side before sending. On Mon, Aug 3, 2020 at 4:15 AM onmstester onmstester <mailto:onmstes...@zoho.com.invalid> wrote: IMHO (readi

Re: Re: streaming stuck on joining a node with TBs of data

2020-08-05 Thread onmstester onmstester
er by sending bigger sstables at sending side or by merging sstables in memtable at receiving side) (Just fixed a wrong word in my previous question) On Wed, 05 Aug 2020 10:02:51 +0430 onmstester onmstester <mailto:onmstes...@zoho.com.INVALID> wrote OK. Thanks I'm using STC

Re: data modeling qu: use a Map datatype, or just simple rows... ?

2020-09-18 Thread onmstester onmstester
I used Cassandra Set (no experience with map ), and one thing for sure is that with Cassandra collections you are limited to a few thousands entry per row (less than 10K for better performance) Sent using https://www.zoho.com/mail/ On Fri, 18 Sep 2020 20:33:21 +0430 Attila Wind wrote

Re: Node is UNREACHABLE after decommission

2020-09-19 Thread onmstester onmstester
Another workaround that i used for UNREACHABLE nodes problem, is to restart the whole cluster and it would be fixed, but i don't know if it cause any problem or not Sent using https://www.zoho.com/mail/ On Fri, 18 Sep 2020 01:19:35 +0430 Paulo Motta wrote Oh, if you're adding t

dropped mutations cross node

2020-09-21 Thread onmstester onmstester
Hi,  I've extended a cluster by 10% and after that each hour, on some of the nodes (which changes randomly each time),  "dropped mutations cross node" appears on logs (each time 1 or 2 drops and some times some thousands with cross node latency from 3000ms to 9ms or 90seconds!) and insert r

Re: dropped mutations cross node

2020-10-05 Thread onmstester onmstester
Thanks, I've done a lot of conf changes to fix  the problem but nothing worked (last one was disabling hints) and after a few days problem gone!! The source of droppedCrossNode was changing every half an hour and it was not always the new nodes No difference between new nodes and old ones in c

reducing RF wen using token allocation algorithm

2020-10-26 Thread onmstester onmstester
Hi, I've set up cluster with: 3.11.2 30 nodes RF=3,single dc, NetworkStrategy Now i'm going to reduce rf to 2, but i've setup cluster with vnode=16 and allocation algorithm(allocate_tokens_for_keyspace) for the main keyspace (which i'm reducing its RF), so is the procedure still be 1. alter

OOM on ccm with large cluster on a single node

2020-10-27 Thread onmstester onmstester
Hi, I'm using ccm to create a cluster of 80 nodes on a physical server with 10 cores and 64GB of ram, but always the 43th node could not start with error: java.lang.OutOfMemoryError: unable to create new native thread apache cassandra 3.11.2 cassandra xmx600M 30GB of memory is still free

local read from coordinator

2020-11-10 Thread onmstester onmstester
Hi, I'm going to read all the data in the cluster as fast as possible, i'm aware that spark could do such things out of the box but just wanted to do it at low level to see how fast it could be. So: 1. retrieved partition keys on each node using nodetool ring token ranges and getting distinct p

Fwd: Re: local read from coordinator

2020-11-11 Thread onmstester onmstester
Thanx, But i'm OK with coordinator part, actually i was looking for kind of read CL to force to read from the coordinator only with no other connections to other nodes! Sent using https://www.zoho.com/mail/ Forwarded message From: Alex Ott To: "user" Date: Wed

Re: local read from coordinator

2020-11-13 Thread onmstester onmstester
chosen in practice) On Nov 11, 2020, at 3:46 AM, Alex Ott <mailto:alex...@gmail.com> wrote: if you force routing key, then the replica that owns the data will be selected as coordinator On Wed, Nov 11, 2020 at 12:35 PM onmstester onmstester <mailto:onmstes...@zoho.com.inval

Fwd: Re: local read from coordinator

2020-11-14 Thread onmstester onmstester
using https://www.zoho.com/mail/ Forwarded message From: onmstester onmstester To: "user" Date: Sat, 14 Nov 2020 08:24:14 +0330 Subject: Re: local read from coordinator Forwarded message Thank you Jeff, I disabled dynami

number of racks in a deployment with VMs

2021-02-14 Thread onmstester onmstester
Hi, In a article by thelastpickle [1], i noticed: The key here is to configure the cluster so that for a given datacenter the number of racks is the same as the replication factor. When using virtual machines as Cassandra nodes we have to set up the cluster in a way that number of racks is

using zstd cause high memtable switch count

2021-02-28 Thread onmstester onmstester
Hi, I'm using 3.11.2, just add the patch for zstd and changed table compression from default (LZ4) to zstd with level 1 and chunk 64kb, everything is fine (disk usage decreased by 40% and CPU usage is almost the same as before), only the memtable switch count was changed dramatically; with lz

Fwd: Re: using zstd cause high memtable switch count

2021-02-28 Thread onmstester onmstester
n Sun, Feb 28, 2021 at 9:22 PM onmstester onmstester <mailto:onmstes...@zoho.com.invalid> wrote: Hi, I'm using 3.11.2, just add the patch for zstd and changed table compression from default (LZ4) to zstd with level 1 and chunk 64kb, everything is fine (disk usage decreased by 40%

Re: What Happened To Alternate Storage And Rocksandra?

2021-03-12 Thread onmstester onmstester
Beside the enhancements at storage layer, i think there are couple of good ideas in Rocksdb that could be used in Cassandra, like the one with disabling sort at memtable-insert part (write data fast like commitlig) and only sort the data when flushing/creating sst files. Sent using https://www.

Re: Question about the num_tokens

2021-04-28 Thread onmstester onmstester
Some posts/papers discusses this in more detail. for example the one from thelastpickle: https://thelastpickle.com/blog/2019/02/21/set-up-a-cluster-with-even-token-distribution.html Which says: Using statistical computation, the point where all clusters of any size always had a good token rang

Re: New Servers - Cassandra 4

2021-08-10 Thread onmstester onmstester
Hi, What about this type of blades, which gives you about 12 (commodity) servers  in 3U: https://www.supermicro.com/en/products/microcloud Sent using https://www.zoho.com/mail/ On Tue, 03 Aug 2021 02:01:13 +0430 Joe Obernberger wrote Thank you Max.  That is a solid choice. 

Separating storage and processing

2021-11-14 Thread onmstester onmstester
Hi, In our Cassandra cluster, because of big rows in input data/data model with TTL of several months, we ended up using almost 80% of storage (5TB per node), but having less than 20% of CPU usage which almost all of it would be writing rows to memtables and compacting sstables, so a lot of CP

gc throughput

2021-11-14 Thread onmstester onmstester
Hi, We are using Apache Cassandra 3.11.2 with its default gc configuration (CMS and ...) on a 16GB heap, i inspected gc logs using gcviewer and it reported 92% of throughput, is that means not necessary to do any further tuning for gc? and everything is ok with gc of Cassandra? Sent using

Re: Separating storage and processing

2021-11-15 Thread onmstester onmstester
I can, but i thought with 5TB per node already violated best practices (1-2 TB per node) and won't be a good idea to 2X or 3X that? Sent using https://www.zoho.com/mail/ On Mon, 15 Nov 2021 20:55:53 +0330 wrote It sounds like you can downsize your cluster but increase your dri

Re: Separating storage and processing

2021-11-15 Thread onmstester onmstester
Thank You Sent using https://www.zoho.com/mail/ On Tue, 16 Nov 2021 10:00:19 +0330 wrote > I can, but i thought with 5TB per node already violated best practices (1-2 >TB per node) and won't be a good idea to 2X or 3X that? The main downside of larger disks is that it takes

Re: gc throughput

2021-11-17 Thread onmstester onmstester
related to GC, regardless what is the GC metric you are looking at saying, you will need to address the issue and that probably will involve some GC tunings. On 15/11/2021 06:00, onmstester onmstester wrote: Hi, We are using Apache Cassandra 3.11.2 with its

Problem on setup Cassandra v4.0.1 cluster

2022-01-31 Thread onmstester onmstester
Hi, I'm trying to setup a Cluster of  apache Cassandra version 4.0.1 with 2 nodes: 1. on node1 (192.168.1.1), extracted tar.gz and config these on yml: - seeds: "192.168.1.1" listen_address: 192.168.1.1 rpc_address: 192.168.1.1 2. started node1 and a few seconds later it is UN 3.on

Fwd: Re: Problem on setup Cassandra v4.0.1 cluster

2022-01-31 Thread onmstester onmstester
Once again it was related to hostname configuration (I remember had problem with this multiple times before even on different applications), this time the root cause was a typo in one of multiple config files for hostname (different name on /etc/hostname with /etc/hosts)! I fixed that and now th

Cassandra internal bottleneck

2022-02-05 Thread onmstester onmstester
Hi, I'm trying to evaluate performance of Apache Cassandra V4.0.1 for write-only workloads using on-premise physical servers. On a single node cluster, doing some optimizations i was able to make CPU of node >90%, throughput is high enough and CPU is the bottleneck as i expected. Then doing

Fwd: Re: Cassandra internal bottleneck

2022-02-05 Thread onmstester onmstester
Thanks, I've got only one client, 10 threads and 1K async writes, This single client was able to send 110K insert/seconds to single node cluster but its only sending 90K insert/seconds to the cluster with 2 nodes(client CPU/network usage is less than 20%) Sent using https://www.zoho.com/mai

TLS/SSL overhead

2022-02-05 Thread onmstester onmstester
Hi, Anyone measured impact of wire encryption using TLS (client_encryption/server_encryption) on cluster latency/throughput? It may be dependent on Hardware or even data model but I already did some sort of measurements and got to 2% for client encryption and 3-5% for client + server encry

Re: TLS/SSL overhead

2022-02-07 Thread onmstester onmstester
ndra maintains persistent connections therefore the visible impact is on connection establishment time (TLS handshake is expensive). Encryption will make thundering herd problems worse. You should watch out for those two issues. Dinesh On Feb 5, 2022, at 3:53 AM, onmstester onmstester <m

slow compactions

2022-03-06 Thread onmstester onmstester
Hi, Sometimes compactions getting so slow (a few KBs per second for each compaction) on a few nodes which would be fixed temporarily by restarting  restarting cassandra (although would coming back a few hours later). Copied sstables related to slow compactions to a isolated/single node cassan

Re: slow compactions

2022-03-06 Thread onmstester onmstester
Forgot to mention that i'm using default STCS for all tables On Sun, 06 Mar 2022 12:29:52 +0330 onmstester onmstester wrote Hi, Sometimes compactions getting so slow (a few KBs per second for each compaction) on a few nodes which would be fixed temporarily by resta

Re: Compaction task priority

2022-09-02 Thread onmstester onmstester via user
I was there too! and found nothing to work around it except stopping big/unnecessary compactions manually (using nodetool stop) whenever they appears by some shell scrips (using crontab) Sent using https://www.zoho.com/mail/ On Fri, 02 Sep 2022 10:59:22 +0430 Gil Ganz wrote ---

Re: Compaction task priority

2022-09-02 Thread onmstester onmstester via user
+0430 onmstester onmstester via user wrote --- I was there too! and found nothing to work around it except stopping big/unnecessary compactions manually (using nodetool stop) whenever they appears by some shell scrips (using crontab) Sent using https://www.zoho.com/mail/ On Fri, 02 Sep

Re: Compaction task priority

2022-09-06 Thread onmstester onmstester via user
PM Jim Shaw <mailto:jxys...@gmail.com> wrote: if capacity allowed,  increase compaction_throughput_mb_per_sec as 1st tuning,  and if still behind, increase concurrent_compactors as 2nd tuning. Regards, Jim On Fri, Sep 2, 2022 at 3:05 AM onmstester onmstester via user <ma

Re: Using zstd compression on Cassandra 3.x

2022-09-12 Thread onmstester onmstester via user
I patched this on 3.11.2 easily: 1. build jar file from src and put in cassandra/lib directory 2. restart cassandra service 3. alter table using compression zstd and rebuild sstables But it was in a time when 4.0 was not available yet and after that i upgraded to 4.0 immidiately. Sent usi

Re: Fwd: Re: Problem on setup Cassandra v4.0.1 cluster

2022-10-08 Thread onmstester onmstester via user
erver timestamp) are common due to manual config or no one thought such problems could prevent a Cassandra node from joining the cluster! Sent using https://www.zoho.com/mail/ On Mon, 31 Jan 2022 16:35:50 +0330 onmstester onmstester wrote --- Once again it was related to hos

RE: Best compaction strategy for rarely used data

2023-01-06 Thread onmstester onmstester via user
Another solution: distribute data in more tables, for example you could create multiple tables based on value or hash_bucket of one of the columns, by doing this current data volume  and compaction overhead would be divided to the number of underlying tables. Although there is a limitation for n

Fwd: Re: Cassandra uneven data repartition

2023-01-06 Thread onmstester onmstester via user
Isn't there a very big (>40GB) sstable in /volumes/cassandra/data/data1? If there is you could split it or change your data model to prevent such sstables. Sent using https://www.zoho.com/mail/ Forwarded message From: Loïc CHANEL via user To: Date: Fri, 06 J

<    1   2