You should first try with removenode which triggers cluster streaming, if
removenode failes or stuck, Assassinate is the last solution.
Sent using https://www.zoho.com/mail/
On Mon, 11 Mar 2019 14:27:13 +0330 Ahmed Eljami
wrote
Hello,
Can someone explain me the differenc
The only option to stream decommissioned node's data is to run "nodetool
decommission" on the decommissioned node (while cassandra is running on the
node)
removenode only streams data from node's relpica, so any data that only stored
on decommissioned node would be lost.
You should monitoring
Running:
SSTablemetadata /THE_KEYSPACE_DIR/mc-1421-big-Data.db
result was:
Estimated droppable tombstones: 1.2
Having STCS and data disk usage of 80% (do not have enough free space for
normal compaction), Is it OK to just: 1. stop Cassandra, 2. delete mc-1421* and
then 3. start Cassandra?
I do not use table default ttl (every row has its own TTL) and also no update
occurs to the rows.
I suppose that (because of immutable nature of everything in cassandra)
cassandra would keep only the insertion timestamp + the original ttl and
computes ttl of a row using these two and current
Just deleted multiple partitions from one of my tables, dumping sstables shows
that the data successfully deleted, but the 'marked_deleted' rows for each of
partitions still exists on sstable and allocates storage.
Is there any way to get rid of these delete statements storage overhead
(everyt
Found the answer: it would be deleted after gc_grace
Just decreased the gc_grace, run compact, and the "marked_deleted" partitions
purged from sstable
Sent using https://www.zoho.com/mail/
On Wed, 24 Apr 2019 14:15:33 +0430 onmstester onmstester
wrote
Just delete
I just read this article by tlp:
https://thelastpickle.com/blog/2019/02/21/set-up-a-cluster-with-even-token-distribution.html
Noticed that:
>>We will need to set the tokens for the seed nodes in each rack manually. This
>>is to prevent each node from randomly calculating its own token ranges
't need to specify tokens
anymore, you can just use allocate_tokens_for_keyspace.
On Sat, May 4, 2019 at 2:14 AM onmstester onmstester
<mailto:onmstes...@zoho.com.invalid> wrote:
>
> I just read this article by tlp:
> https://thelastpickle.com/blog/2019/02/21/se
e vnodes - number of
token per node and the number of racks.
Regards,
Anthony
On Sat, 4 May 2019 at 19:14, onmstester onmstester
<mailto:onmstes...@zoho.com.invalid> wrote:
I just read this article by tlp:
https://thelastpickle.com/blog/2019/02/21/set-up-a-cluster-with
Thank you all!
Sent using https://www.zoho.com/mail/
On Sat, 20 Jul 2019 16:13:29 +0430 Rahul Singh
wrote
Hey Cassandra community ,
Thanks for all the feedback in the past on my cassandra knowledge base project.
Without the feedback cycle it’s not really for the community.
I've configured a simple cluster using two PC with identical spec:
cpu core i5 RAM: 8GB ddr3 Disk: 1TB 5400rpm Network: 1 G (I've test it with
iperf, it really is!)
using the common configs described in many sites including datastax itself:
cluster_name: 'MyCassandraCluster' num_tokens: 256 se
I have a single structured row as input with rate of 10K per seconds. Each row
has 20 columns. Some queries should be answered on these inputs. Because most
of queries needs different where, group by or orderby, The final data model
ended up like this:
primary key for table of query1 : ((colum
start.
--
Rahul Singh
rahul.si...@anant.us
Anant Corporation
On Feb 18, 2018, 6:29 AM -0500, onmstester onmstester
<onmstes...@zoho.com>, wrote:
I've configured a simple cluster using two PC with identical spec:
cpu core i5 RAM: 8GB ddr3 Disk: 1TB 5400rpm Network: 1 G (I'
re chokepoints in the GC cycle.
--
Rahul Singh
rahul.si...@anant.us
Anant Corporation
On Feb 18, 2018, 9:23 AM -0500, onmstester onmstester
<onmstes...@zoho.com>, wrote:
But monitoring cassandra with jmx using jvisualVM shows no problem, less than
30% of heap size u
8:29 GMT-03:00 onmstester onmstester <onmstes...@zoho.com>:
I've configured a simple cluster using two PC with identical spec:
cpu core i5 RAM: 8GB ddr3 Disk: 1TB 5400rpm Network: 1 G (I've test it with
iperf, it really is!)
using the common configs described in many site
Another Question on node density, in this scenario:
1. we should keep time series data of some years for a heavy write system in
Cassandra (> 10K Ops in seconds)
2. the system is insert only and inserted data would never be updated
3. in partition key, we used number of months since 1970, so da
Another Question on node density, in this scenario:
1. we should keep time series data of some years for a heavy write system in
Cassandra (> 10K Ops in seconds)
2. the system is insert only and inserted data would never be updated
3. in partition key, we used number of months since 1970, so
What i've got to set up my Apache Cassandra cluster are some Servers with 20
Core cpu * 2 Threads and 128 GB ram and 8 * 2TB disk.
Just read all over the web: Do not use big nodes for your cluster, i'm
convinced to run multiple nodes on a single physical server.
So the question is which techno
o 1.415.501.0198
London 44 020 8144 9872
On Tue, Feb 27, 2018 at 8:26 PM, onmstester onmstester
<onmstes...@zoho.com> wrote:
What i've got to set up my Apache Cassandra cluster are some Servers with 20
Core cpu * 2 Threads and 128 GB ram and 8 * 2TB disk.
Just r
Running this command:
nodetools cfhistograms keyspace1 table1
throws this exception in production server:
javax.management.InstanceNotFoundException:
org.apache.cassandra.metrics:type=Table,keyspace=keyspace1,scope=table1,name=EstimatePartitionSizeHistogram
But i have no problem in a test s
I'm using int data type for one of my columns but for 99.99...% its data never
would be > 65K, Should i change it to smallint (It would save some Gigabytes
disks in a few months) or Cassandra Compression would take care of it in
storage?
What about blob data type ? Isn't better to use it in s
t from my iPhone
On Mar 6, 2018, at 3:29 AM, onmstester onmstester <onmstes...@zoho.com>
wrote:
Running this command:
nodetools cfhistograms keyspace1 table1
throws this exception in production server:
javax.management.InstanceNotFoundException:
org.apache.cassandra.metri
Would it be possible to copy/paste Cassandra data directory from one of nodes
(which Its OS partition corrupted) and use it in a fresh Cassandra node? I've
used rf=1 so that's my only chance!
Sent using Zoho Mail
apshot_restore_t.html#ops_backup_snapshot_restore_t
Cheers
Ben
On Thu, 8 Mar 2018 at 17:07 onmstester onmstester <onmstes...@zoho.com>
wrote:
--
Ben Slater
Chief Product Officer
Read our latest technical blog posts here.
This email has been sent on behalf of Instaclustr Pty. Limited (Aust
arl.muel...@smartthings.com> wrote
If you're willing to do the data type conversion in insert and retrieval, the
you could use blobs as a sort of "adaptive length int" AFAIK
On Tue, Mar 6, 2018 at 6:02 AM, onmstester onmstester
<onmstes...@zoho.com> wrote:
I&
could i calculate disk usage
approximately(without inserting actual data)?
Sent using Zoho Mail
On Sat, 10 Mar 2018 11:21:44 +0330 onmstester onmstester
<onmstes...@zoho.com> wrote
I've find out that blobs has no gain in storage saving!
I had some 16 digit number
I'm going to benchmark Cassandra's write throughput on a node with following
spec:
CPU: 20 Cores
Memory: 128 GB (32 GB as Cassandra heap)
Disk: 3 seprate disk for OS, data and commitlog
Network: 10 Gb (test it with iperf)
Os: Ubuntu 16
Running Cassandra-stress:
cassandra-stress write n=100
e host test?
On Sun, Mar 11, 2018 at 10:44 PM, onmstester onmstester
<onmstes...@zoho.com> wrote:
I'm going to benchmark Cassandra's write throughput on a node with following
spec:
CPU: 20 Cores
Memory: 128 GB (32 GB as Cassandra heap)
Disk: 3 seprate disk for OS, data an
Mail
On Mon, 12 Mar 2018 09:34:26 +0330 onmstester onmstester
<onmstes...@zoho.com> wrote
Apache-cassandra-3.11.1
Yes, i'm dosing a single host test
Sent using Zoho Mail
On Mon, 12 Mar 2018 09:24:04 +0330 Jeff Jirsa <jji...@gmail.com>
wrote
-Henri Berthemet
From: onmstester onmstester [mailto:onmstes...@zoho.com]
Sent: Monday, March 12, 2018 10:48 AM
To: user <user@cassandra.apache.org>
Subject: Re: yet another benchmark bottleneck
Running two instance of Apache Cassandra on same server, each having their own
On Mon, 12 Mar 2018 14:25:12 +0330 Jacques-Henri Berthemet
<jacques-henri.berthe...@genesys.com> wrote
Any errors/warning in Cassandra logs? What’s your RF?
Using 300MB/s of network bandwidth for only 130 op/s looks very high.
--
Jacques-Henri Berthemet
From:
rrent_writes: 32
concurrent_counter_writes: 32
Jumping directly to 160 would be a bit high with spinning disks, maybe start
with 64 just to see if it gets better.
--
Jacques-Henri Berthemet
From: onmstester onmstester [mailto:onmstes...@zoho.com]
Sent: Monday, March 12, 2018 12:08 PM
on another host?
--
Jacques-Henri Berthemet
From: onmstester onmstester [mailto:onmstes...@zoho.com]
Sent: Monday, March 12, 2018 12:50 PM
To: user <user@cassandra.apache.org>
Subject: RE: yet another benchmark bottleneck
no luck even with 320 threads for write
There's few known issues in the
write-path at least that prevent scaling with high CPU core count.
- Micke
On 03/12/2018 03:14 PM, onmstester onmstester wrote:
> I mentioned that already tested increasing client threads + many
> stress-client instances in one node + two s
Each cassandra node creates 6 seperate threads for incomming and outgoing
streams to other nodes in the cluster. So with big clusters for example
100 nodes, it would be more than 600 threads running in each Cassandra app,
that would cause performance problems, so better have multiple small
clu
Sweet spot for set and list items count (in datastax's documents, the max is
2billions)?
Write and read performance of Set vs List vs simple partition row?
Thanks in advance
Using Apache Cassandra 3.11.2, defined a table like this:
create table my_table(
partition text,
clustering1 int,
clustering2 text,
data set,
primary key (partition, clustering1, clustering2))
and con
Sent from my iPhone
On Jan 12, 2020, at 6:04 AM, onmstester onmstester
<mailto:onmstes...@zoho.com.invalid> wrote:
Using Apache Cassandra 3.11.2, defined a table like this:
create table my_table(
partition text,
clusterin
rows, so i suppose that clustering key
restrictions been pushed down to storage engine.
Thanks Jeff
Sent using https://www.zoho.com/mail/
On Mon, 13 Jan 2020 08:38:44 +0330 onmstester onmstester
<mailto:onmstes...@zoho.com.INVALID> wrote
Done.
https://issues.apache
Sorry if its trivial, but i do not understand how num_tokens affects
availability, with RF=3, CLW,CLR=quorum, the cluster could tolerate to lost at
most one node and all of the tokens assigned to that node would be also
assigned to two other nodes no matter what num_tokens is, right?
Sent usin
out node 1 & 4, then ranges B & L would no longer
meet CL=quorum; but you can do that in the top diagram, since there are no
ranges shared between node 1 & 4.
Hope that helps.
- Max
On Feb 3, 2020, at 8:39 pm, onmstester onmstester
<mailto:onmstes...@zoho.com.INVALI
I just changed these properties to increase flushed file size (decrease number
of compactions):
memtable_allocation_type from heap_buffers to offheap_objects
memtable_offheap_space_in_mb: from default (2048) to 8192
Using default value for other memtable/compaction/commitlog configurations .
Yes, you should handle the routing logic at app level
I wish there was another level of sharding (above dc, rack) as cluster to
distribute data on multiple cluster! but i don't think there is any other
database that does such a thing for you.
Another problem with big cluster is for huge amount
Hi,
I'm using allocate_tokens_for_keyspace and num_tokens=32 and i wan't to extend
the size of some clusters.
I read in articles that for num_tokens=4, one should add more 25% of cluster
size for the cluster to become balanced again.
1. For example, with num_tokens=4 and already have 16 n
Hi,
I think that Cassandra alone is not suitable for your use case. You can use a
mix of Distributed/NoSQL (to storing single records of whatever makes your
input the big data) & Relational/Single Database (for transactional non-big
data part)
Sent using https://www.zoho.com/mail/
O
Hi,
Logically, i do not need to use multiple DCs(cluster is not geographically
separated), but i wonder if splitting the cluster to two half (two separate dc)
would decrease overhead of node ack/communication and result in better (write)
performance?
Sent using https://www.zoho.com/mail/
rding in a way I
havent personally figured out yet (maybe if you had a very high replica count
per DC, then using forwarding and EACH_QUORUM may get fun, but you'd be better
off dropping the replica count than coming up with stuff like this).
On Tue, Jul 28, 2020 at 8:27 PM onmstester
Hi,
I'm going to join multiple new nodes to already existed and running cluster.
Each node should stream in >2TB of data, and it took a few days (with 500Mb
streaming) to almost get finished. But it stuck on streaming-in from one final
node, but i can not see any bottleneck on any side (sourc
No Secondary index, No SASI, No materialized view
Sent using https://www.zoho.com/mail/
On Sat, 01 Aug 2020 11:02:54 +0430 Jeff Jirsa wrote
Are there secondary indices involved?
On Jul 31, 2020, at 10:51 PM, onmstester onmstester
<mailto:onmstes...@zoho.com.inva
own risk).
On Jul 31, 2020, at 11:46 PM, onmstester onmstester
<mailto:onmstes...@zoho.com.invalid> wrote:
No Secondary index, No SASI, No materialized view
Sent using https://www.zoho.com/mail/
On Sat, 01 Aug 2020 11:02:54 +0430 Jeff Jirsa <mailto:jji...@gmail.com>
e any configuration
in cassandra to force streamed-in to pass memtable-sstable cycle, to have
bigger sstables at first place?
Sent using https://www.zoho.com/mail/
Forwarded message
From: onmstester onmstester
To: "user"
Date: Sun, 02 Aug 2020 08:35:3
e.g. if you're using LCS, change
sstable size from 160M to something higher), but there's no magic to join /
compact those data files on the sending side before sending.
On Mon, Aug 3, 2020 at 4:15 AM onmstester onmstester
<mailto:onmstes...@zoho.com.invalid> wrote:
IMHO (readi
er by sending bigger sstables at sending side or by merging
sstables in memtable at receiving side)
(Just fixed a wrong word in my previous question)
On Wed, 05 Aug 2020 10:02:51 +0430 onmstester onmstester
<mailto:onmstes...@zoho.com.INVALID> wrote
OK. Thanks
I'm using STC
I used Cassandra Set (no experience with map ), and one thing for sure is that
with Cassandra collections you are limited to a few thousands entry per row
(less than 10K for better performance)
Sent using https://www.zoho.com/mail/
On Fri, 18 Sep 2020 20:33:21 +0430 Attila Wind
wrote
Another workaround that i used for UNREACHABLE nodes problem, is to restart the
whole cluster and it would be fixed, but i don't know if it cause any problem
or not
Sent using https://www.zoho.com/mail/
On Fri, 18 Sep 2020 01:19:35 +0430 Paulo Motta
wrote
Oh, if you're adding t
Hi,
I've extended a cluster by 10% and after that each hour, on some of the nodes
(which changes randomly each time), "dropped mutations cross node" appears on
logs (each time 1 or 2 drops and some times some thousands with cross node
latency from 3000ms to 9ms or 90seconds!) and insert r
Thanks,
I've done a lot of conf changes to fix the problem but nothing worked (last
one was disabling hints) and after a few days problem gone!!
The source of droppedCrossNode was changing every half an hour and it was not
always the new nodes
No difference between new nodes and old ones in c
Hi,
I've set up cluster with:
3.11.2
30 nodes
RF=3,single dc, NetworkStrategy
Now i'm going to reduce rf to 2, but i've setup cluster with vnode=16 and
allocation algorithm(allocate_tokens_for_keyspace) for the main keyspace (which
i'm reducing its RF), so is the procedure still be 1. alter
Hi,
I'm using ccm to create a cluster of 80 nodes on a physical server with 10
cores and 64GB of ram, but always the 43th node could not start with error:
java.lang.OutOfMemoryError: unable to create new native thread
apache cassandra 3.11.2
cassandra xmx600M
30GB of memory is still free
Hi,
I'm going to read all the data in the cluster as fast as possible, i'm aware
that spark could do such things out of the box but just wanted to do it at low
level to see how fast it could be. So:
1. retrieved partition keys on each node using nodetool ring token ranges and
getting distinct p
Thanx,
But i'm OK with coordinator part, actually i was looking for kind of read CL to
force to read from the coordinator only with no other connections to other
nodes!
Sent using https://www.zoho.com/mail/
Forwarded message
From: Alex Ott
To: "user"
Date: Wed
chosen in practice)
On Nov 11, 2020, at 3:46 AM, Alex Ott <mailto:alex...@gmail.com> wrote:
if you force routing key, then the replica that owns the data will be selected
as coordinator
On Wed, Nov 11, 2020 at 12:35 PM onmstester onmstester
<mailto:onmstes...@zoho.com.inval
using https://www.zoho.com/mail/
Forwarded message
From: onmstester onmstester
To: "user"
Date: Sat, 14 Nov 2020 08:24:14 +0330
Subject: Re: local read from coordinator
Forwarded message
Thank you Jeff,
I disabled dynami
Hi,
In a article by thelastpickle [1], i noticed:
The key here is to configure the cluster so that for a given datacenter the
number of racks is the same as the replication factor.
When using virtual machines as Cassandra nodes we have to set up the cluster in
a way that number of racks is
Hi,
I'm using 3.11.2, just add the patch for zstd and changed table compression
from default (LZ4) to zstd with level 1 and chunk 64kb, everything is fine
(disk usage decreased by 40% and CPU usage is almost the same as before), only
the memtable switch count was changed dramatically; with lz
n Sun, Feb 28, 2021 at 9:22 PM onmstester onmstester
<mailto:onmstes...@zoho.com.invalid> wrote:
Hi,
I'm using 3.11.2, just add the patch for zstd and changed table compression
from default (LZ4) to zstd with level 1 and chunk 64kb, everything is fine
(disk usage decreased by 40%
Beside the enhancements at storage layer, i think there are couple of good
ideas in Rocksdb that could be used in Cassandra, like the one with disabling
sort at memtable-insert part (write data fast like commitlig) and only sort the
data when flushing/creating sst files.
Sent using https://www.
Some posts/papers discusses this in more detail. for example the one from
thelastpickle:
https://thelastpickle.com/blog/2019/02/21/set-up-a-cluster-with-even-token-distribution.html
Which says:
Using statistical computation, the point where all clusters of any size always
had a good token rang
Hi,
What about this type of blades, which gives you about 12 (commodity) servers
in 3U:
https://www.supermicro.com/en/products/microcloud
Sent using https://www.zoho.com/mail/
On Tue, 03 Aug 2021 02:01:13 +0430 Joe Obernberger
wrote
Thank you Max. That is a solid choice.
Hi,
In our Cassandra cluster, because of big rows in input data/data model with TTL
of several months, we ended up using almost 80% of storage (5TB per node), but
having less than 20% of CPU usage which almost all of it would be writing rows
to memtables and compacting sstables, so a lot of CP
Hi,
We are using Apache Cassandra 3.11.2 with its default gc configuration (CMS and
...) on a 16GB heap, i inspected gc logs using gcviewer and it reported 92% of
throughput, is that means not necessary to do any further tuning for gc? and
everything is ok with gc of Cassandra?
Sent using
I can, but i thought with 5TB per node already violated best practices (1-2 TB
per node) and won't be a good idea to 2X or 3X that?
Sent using https://www.zoho.com/mail/
On Mon, 15 Nov 2021 20:55:53 +0330 wrote
It sounds like you can downsize your cluster but increase your dri
Thank You
Sent using https://www.zoho.com/mail/
On Tue, 16 Nov 2021 10:00:19 +0330 wrote
> I can, but i thought with 5TB per node already violated best practices (1-2
>TB per node) and won't be a good idea to 2X or 3X that?
The main downside of larger disks is that it takes
related to
GC, regardless what is the GC metric you are looking at saying,
you will need to address the issue and that probably will involve
some GC tunings.
On 15/11/2021 06:00, onmstester
onmstester wrote:
Hi,
We are using Apache Cassandra 3.11.2 with its
Hi,
I'm trying to setup a Cluster of apache Cassandra version 4.0.1 with 2 nodes:
1. on node1 (192.168.1.1), extracted tar.gz and config these on yml:
- seeds: "192.168.1.1"
listen_address: 192.168.1.1
rpc_address: 192.168.1.1
2. started node1 and a few seconds later it is UN
3.on
Once again it was related to hostname configuration (I remember had problem
with this multiple times before even on different applications), this time the
root cause was a typo in one of multiple config files for hostname (different
name on /etc/hostname with /etc/hosts)! I fixed that and now th
Hi,
I'm trying to evaluate performance of Apache Cassandra V4.0.1 for write-only
workloads using on-premise physical servers.
On a single node cluster, doing some optimizations i was able to make CPU of
node >90%, throughput is high enough and CPU is the bottleneck as i expected.
Then doing
Thanks,
I've got only one client, 10 threads and 1K async writes, This single client
was able to send 110K insert/seconds to single node cluster but its only
sending 90K insert/seconds to the cluster with 2 nodes(client CPU/network usage
is less than 20%)
Sent using https://www.zoho.com/mai
Hi,
Anyone measured impact of wire encryption using TLS
(client_encryption/server_encryption) on cluster latency/throughput?
It may be dependent on Hardware or even data model but I already did some sort
of measurements and got to 2% for client encryption and 3-5% for client +
server encry
ndra maintains persistent connections therefore the visible impact is on
connection establishment time (TLS handshake is expensive). Encryption will
make thundering herd problems worse. You should watch out for those two issues.
Dinesh
On Feb 5, 2022, at 3:53 AM, onmstester onmstester <m
Hi,
Sometimes compactions getting so slow (a few KBs per second for each
compaction) on a few nodes which would be fixed temporarily by restarting
restarting cassandra (although would coming back a few hours later).
Copied sstables related to slow compactions to a isolated/single node
cassan
Forgot to mention that i'm using default STCS for all tables
On Sun, 06 Mar 2022 12:29:52 +0330 onmstester onmstester
wrote
Hi,
Sometimes compactions getting so slow (a few KBs per second for each
compaction) on a few nodes which would be fixed temporarily by resta
I was there too! and found nothing to work around it except stopping
big/unnecessary compactions manually (using nodetool stop) whenever they
appears by some shell scrips (using crontab)
Sent using https://www.zoho.com/mail/
On Fri, 02 Sep 2022 10:59:22 +0430 Gil Ganz wrote ---
+0430 onmstester onmstester via user
wrote ---
I was there too! and found nothing to work around it except stopping
big/unnecessary compactions manually (using nodetool stop) whenever they
appears by some shell scrips (using crontab)
Sent using https://www.zoho.com/mail/
On Fri, 02 Sep
PM Jim Shaw <mailto:jxys...@gmail.com> wrote:
if capacity allowed, increase compaction_throughput_mb_per_sec as 1st tuning,
and if still behind, increase concurrent_compactors as 2nd tuning.
Regards,
Jim
On Fri, Sep 2, 2022 at 3:05 AM onmstester onmstester via user
<ma
I patched this on 3.11.2 easily:
1. build jar file from src and put in cassandra/lib directory
2. restart cassandra service
3. alter table using compression zstd and rebuild sstables
But it was in a time when 4.0 was not available yet and after that i upgraded
to 4.0 immidiately.
Sent usi
erver timestamp) are
common due to manual config or no one thought such problems could prevent a
Cassandra node from joining the cluster!
Sent using https://www.zoho.com/mail/
On Mon, 31 Jan 2022 16:35:50 +0330 onmstester onmstester
wrote ---
Once again it was related to hos
Another solution: distribute data in more tables, for example you could create
multiple tables based on value or hash_bucket of one of the columns, by doing
this current data volume and compaction overhead would be divided to the
number of underlying tables. Although there is a limitation for n
Isn't there a very big (>40GB) sstable in /volumes/cassandra/data/data1? If
there is you could split it or change your data model to prevent such sstables.
Sent using https://www.zoho.com/mail/
Forwarded message
From: Loïc CHANEL via user
To:
Date: Fri, 06 J
101 - 189 of 189 matches
Mail list logo