The problem is that the user wants to access old data also using cql, not popping un a Sparksql just to fetch one or two old records
Le 4 oct. 2019 12:38, "Cedrick Lunven" <[email protected]> a écrit : > Hi, > > If you are using DataStax Enterprise why not offloading cold data to DSEFS > (HDFS implementation) with friendly analytics storage format like parquet, > keep only OLTP in the Cassandra Tables. Recommended size for DSEFS can go > up to 30TB a node. > > I am pretty sure you are already aware of this option and would be curious > to get your think about this solution and limitations. > > Note: that would also probably help you with your init-load/TWCS issue . > > My2c. > Cedrick > > On Tue, Oct 1, 2019 at 11:49 PM DuyHai Doan <[email protected]> wrote: > >> The client wants to be able to access cold data (2 years old) in the >> same cluster so moving data to another system is not possible >> >> However, since we're using Datastax Enterprise, we can leverage Tiered >> Storage and store old data on Spinning Disks to save on hardware >> >> Regards >> >> On Tue, Oct 1, 2019 at 9:47 AM Julien Laurenceau >> <[email protected]> wrote: >> > >> > Hi, >> > Depending on the use case, you may also consider storage tiering with >> fresh data on hot-tier (Cassandra) and older data on cold-tier >> (Spark/Parquet or Presto/Parquet). It would be a lot more complex, but may >> fit more appropriately the budget and you may reuse some tech already >> present in your environment. >> > You may even do subsampling during the transformation offloading data >> from Cassandra in order to keep one point out of 10 for older data if >> subsampling makes sense for your data signal. >> > >> > Regards >> > Julien >> > >> > Le lun. 30 sept. 2019 à 22:03, DuyHai Doan <[email protected]> a >> écrit : >> >> >> >> Thanks all for your reply >> >> >> >> The target deployment is on Azure so with the Nice disk snapshot >> feature, replacing a dead node is easier, no streaming from Cassandra >> >> >> >> About compaction overhead, using TwCs with a 1 day bucket and removing >> read repair and subrange repair should be sufficient >> >> >> >> Now the only remaining issue is Quorum read which triggers repair >> automagically >> >> >> >> Before 4.0 there is no flag to turn it off unfortunately >> >> >> >> Le 30 sept. 2019 15:47, "Eric Evans" <[email protected]> a >> écrit : >> >> >> >> On Sat, Sep 28, 2019 at 8:50 PM Jeff Jirsa <[email protected]> wrote: >> >> >> >> [ ... ] >> >> >> >> > 2) The 2TB guidance is old and irrelevant for most people, what you >> really care about is how fast you can replace the failed machine >> >> > >> >> > You’d likely be ok going significantly larger than that if you use a >> few vnodes, since that’ll help rebuild faster (you’ll stream from more >> sources on rebuild) >> >> > >> >> > If you don’t want to use vnodes, buy big machines and run multiple >> Cassandra instances in it - it’s not hard to run 3-4TB per instance and >> 12-16T of SSD per machine >> >> >> >> We do this too. It's worth keeping in mind though that you'll still >> >> have a 12-16T blast radius in the event of a host failure. As the >> >> host density goes up, consider steps to make the host more robust >> >> (RAID, redundant power supplies, etc). >> >> >> >> -- >> >> Eric Evans >> >> [email protected] >> >> >> >> --------------------------------------------------------------------- >> >> To unsubscribe, e-mail: [email protected] >> >> For additional commands, e-mail: [email protected] >> >> >> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [email protected] >> For additional commands, e-mail: [email protected] >> >> > > -- > > > Cedrick Lunven | EMEA Developer Advocate Manager > > > <https://www.linkedin.com/in/clunven/> <https://twitter.com/clunven> > <https://clun.github.io/> <https://github.com/clun/> > > > ❓Ask us your questions : *DataStax Community > <https://community.datastax.com/index.html>* > > 🔬Test our new products : *DataStax Labs > <https://downloads.datastax.com/#labs>* > > > > <https://constellation.datastax.com/?utm_campaign=FY20Q2_CONSTELLATION&utm_medium=email&utm_source=signature> > > >
