Re: Multiple compactions to same disk with 3.11.4

2019-10-01 Thread Matthias Pfau
You are right, you could set concurrent_compactors to 1 to just allow a single compaction at a time. However, that isn't feasible in our scenario with multiple data dirs as compactions would accumulate. We wan't to run multiple compactions in parallel but only one per data dir... Best, Matthias

Re: snapshots and 'dot' prefixed _index directories

2019-10-01 Thread Elliott Sims
The tar error is because tar also looks for metadata changes. In this case, it's the refcount that's changing and causing the error. I just switched to using bsdtar instead as a workaround. On Tue, Oct 1, 2019, 5:37 PM James A. Robinson wrote: > Hi folks, > > > I took a nodetool snapshot of a

snapshots and 'dot' prefixed _index directories

2019-10-01 Thread James A. Robinson
Hi folks, I took a nodetool snapshot of a keyspace in my cassandra 3.11 cluster and it included directories with a 'dot' prefix (often called a hidden file/directory). As an example: /var/lib/cassandra/data/impactvizor/tableau_notification-04bfb600291e11e7aeab31f0f0e5804b/snapshots/1569974640/.

Re: Challenge with initial data load with TWCS

2019-10-01 Thread DuyHai Doan
Thanks Alex for confirming Le 30 sept. 2019 09:17, "Oleksandr Shulgin" a écrit : > On Sun, Sep 29, 2019 at 9:42 AM DuyHai Doan wrote: > >> Thanks Jeff for sharing the ideas. I have some question though: >> >> - CQLSSTableWriter and explicitly break between windows --> Even if >> you break betwe

Re: Cluster sizing for huge dataset

2019-10-01 Thread DuyHai Doan
The client wants to be able to access cold data (2 years old) in the same cluster so moving data to another system is not possible However, since we're using Datastax Enterprise, we can leverage Tiered Storage and store old data on Spinning Disks to save on hardware Regards On Tue, Oct 1, 2019 a

Re: Multiple compactions to same disk with 3.11.4

2019-10-01 Thread Elliott Sims
There's a concurrent_compactors parameter in cassandra.yml that does exactly what the name says. You may also find compaction_throughput_mb_per_sec useful. On Tue, Oct 1, 2019 at 8:16 AM Matthias Pfau wrote: > Hi there, > we recently upgraded from 2.2 to 3.11.4. > > Unfortunately, we are runnin

Re: Sizing a cluster

2019-10-01 Thread jagernicolas
Hi Léo thax for the links, Is that the size of the uncompressed data or the data once it has been inserted and compressed by cassandra ?The size of 0.5MB is the size of the data we sent, before cassandra do compression if any. Looking at the cassandra compression : http://cassandra.apa

Multiple compactions to same disk with 3.11.4

2019-10-01 Thread Matthias Pfau
Hi there, we recently upgraded from 2.2 to 3.11.4. Unfortunately, we are running into problems with the compaction scheduling, now. From time to time, a bunch of compactions (e.g. 6) are scheduled for the same data dir. This makes no sense for spinning disks as it will slow down all compaction

Re: Drastic increase of bloom filter sizer after upgrading from 2.2.14 to 3.11.4

2019-10-01 Thread Matthias Pfau
Just a short follow up on this: After running upgradesstables for a CF, off heap memory used by bloom filters increases by a factor between 6 and 12 in our case. This is a cassandra bug. Bloom filters are obviously calculated before splitting the sstable for multiple data dirs. When you delete

Re: Sizing a cluster

2019-10-01 Thread Léo FERLIN SUTTON
Hi ! I'm not an expert but don't forget that cassandra needs space to do it's compactions. Take a look at the worst case scenarios from this datastax grid : https://docs.datastax.com/en/dse-planning/doc/planning/capacityPlanning.html#capacityPlanning__disk > The size of a picture + data is about

Sizing a cluster

2019-10-01 Thread jagernicolas
Hi, We want to use Cassandra to store camera detection. The size of a picture + data is about 0.5MB. We starting with 5 devices, but we targeting 50 device for the next year, and could go up to 1000. I summary everything , * Number of sources: 5 - 50 - 1000 (src) * Frequency o

Re: Cluster sizing for huge dataset

2019-10-01 Thread Julien Laurenceau
Hi, Depending on the use case, you may also consider storage tiering with fresh data on hot-tier (Cassandra) and older data on cold-tier (Spark/Parquet or Presto/Parquet). It would be a lot more complex, but may fit more appropriately the budget and you may reuse some tech already present in your e