The problem is that the user wants to access old data also using cql, not
popping un a Sparksql just to fetch one or two old records
Le 4 oct. 2019 12:38, "Cedrick Lunven" a
écrit :
> Hi,
>
> If you are using DataStax Enterprise why not offloading cold data to DSEFS
> (HDFS implementation) with
Hi,
If you are using DataStax Enterprise why not offloading cold data to DSEFS
(HDFS implementation) with friendly analytics storage format like parquet,
keep only OLTP in the Cassandra Tables. Recommended size for DSEFS can go
up to 30TB a node.
I am pretty sure you are already aware of this opt
The client wants to be able to access cold data (2 years old) in the
same cluster so moving data to another system is not possible
However, since we're using Datastax Enterprise, we can leverage Tiered
Storage and store old data on Spinning Disks to save on hardware
Regards
On Tue, Oct 1, 2019 a
Hi,
Depending on the use case, you may also consider storage tiering with fresh
data on hot-tier (Cassandra) and older data on cold-tier (Spark/Parquet or
Presto/Parquet). It would be a lot more complex, but may fit more
appropriately the budget and you may reuse some tech already present in
your e
Thanks all for your reply
The target deployment is on Azure so with the Nice disk snapshot feature,
replacing a dead node is easier, no streaming from Cassandra
About compaction overhead, using TwCs with a 1 day bucket and removing read
repair and subrange repair should be sufficient
Now the onl
On Sat, Sep 28, 2019 at 8:50 PM Jeff Jirsa wrote:
[ ... ]
> 2) The 2TB guidance is old and irrelevant for most people, what you really
> care about is how fast you can replace the failed machine
>
> You’d likely be ok going significantly larger than that if you use a few
> vnodes, since that’l
I noticed that the compaction overhead has not been taken into account
while capacity planning, I think it is due to the used compression is going
to compensate for that. Is my assumption correct?
On Sun, Sep 29, 2019 at 11:04 PM Jeff Jirsa wrote:
>
>
> > On Sep 29, 2019, at 12:30 AM, DuyHai Doa
> On Sep 29, 2019, at 12:30 AM, DuyHai Doan wrote:
>
> Thank you Jeff for the hints
>
> We are targeting to reach 20Tb/machine using TWCS and 8 vnodes (using
> the new token allocation algo). Also we will try the new zstd
> compression.
I’d provably still be inclined to run two instances pe
Thank you Jeff for the hints
We are targeting to reach 20Tb/machine using TWCS and 8 vnodes (using
the new token allocation algo). Also we will try the new zstd
compression.
About transient replication, the underlying trade-offs and semantics
are hard to understand for common people (for example,
A few random thoughts here
1) 90 nodes / 900T in a cluster isn’t that big. petabyte per cluster is a
manageable size.
2) The 2TB guidance is old and irrelevant for most people, what you really care
about is how fast you can replace the failed machine
You’d likely be ok going significantly lar
10 matches
Mail list logo