Thanx all for the good tips
-Tobias
From: Eric Evans
Reply to: "user@cassandra.apache.org"
Date: Tuesday, 21 April 2020 at 16:02
To: "user@cassandra.apache.org"
Subject: Re: Write to SSTables to do really fast initial load of database (e.g.
for migration)
On Tue, Apr 21, 2020 at 4:16 AM Er
not directly related, but you can try to use zstd as compression - in my
tests it performed faster offload, with slightly worse compression ratio
Marc Richter at "Wed, 22 Apr 2020 17:57:44 +0200" wrote:
MR> Seems as if sstable2json is deprecated; see [1] and [2].
MR> So, dsbulk [3] it is, I gu
Seems as if sstable2json is deprecated; see [1] and [2].
So, dsbulk [3] it is, I guess.
I downloaded it and crafted the following commandline from the docs [4]
for my use case:
$ ../dsbulk-1.5.0/bin/dsbulk unload -h '["MY_CASSANDRA_IP"]' \
--driver.advanced.auth-provider.class PlainTextAuth
If the memory wasn’t being used, and it got pushed to swap, then the right
thing happened. It’s a common misconception that swap is bad. The use of swap
isn’t bad. What is bad is if you find data churning in and out of swap space a
lot so that your latency increases either due to the page fau
Marc,
In DSE CQL offers option called CAPTURE, which can save output of query to a
directed file. May be you can use that option to save all values you need in
that file to see all signalids or whichever columns you need. File may grow big
based on your dataset, so I am not sure what limit it im
This sounds like a promising way; thank you for bringing this up!
I will see if I can manage it with this approach.
Best regards,
Marc Richter
On 22.04.20 15:38, Durity, Sean R wrote:
I thought this might be a single-time use case request. I think my first
approach would be to use something
DSBulk also works with JSON...
if transformations of data are complex, I would go with Spark running in
local mode, and process data...
On Wed, Apr 22, 2020 at 3:38 PM Durity, Sean R
wrote:
> I thought this might be a single-time use case request. I think my first
> approach would be to use some
I thought this might be a single-time use case request. I think my first
approach would be to use something like dsbulk to unload the data and then
reload it into a table designed for the query you want to do (as long as you
have adequate disk space). I think like a DBA/admin first. Dsbulk creat
I am running spark (max heap 4G) and a java application (4G) with my
Cassandra server (8G).
After heavy loading, if I run a spark process some main memory is pushed
into swap. But if a restart Cassandra and execute the spark process memory
is not pushed into the swap.
Idea behind asking the above
Hi Jeff,
thank you for your exhaustive and verbose answer!
Also, a very big "Thank you!" to all the other replyers; I hope you
understand that I summarize all your feedback in this single answer.
From what I understand from your answers, Cassandra seems to be
optimized to store (and read) dat
Hi Marc,
On Tue, Apr 21, 2020 at 4:20 PM Marc Richter wrote:
> The database is already of round about 260 GB in size.
> I now need to know what is the most recent entry in it; the correct
> column to learn this would be "insertdate".
>
> In SQL I would do something like this:
>
> SELECT insertda
11 matches
Mail list logo