HDFS / S3 is a great place to dump this data. You can also consider other types 
of compaction strategies for “COLD DATA” in not so powerful C* clusters for 
which the purpose is write only. C* is still better in my opinion for data 
management than S3/HDFS.  It depends on how easy you want the retrieval and 
analysis to be.



--
Rahul Singh
rahul.si...@anant.us

Anant Corporation

On Mar 12, 2018, 8:30 AM -0400, Javier Pareja <pareja.jav...@gmail.com>, wrote:
> Hi,
>
> I understand that a well designed cassandra system will allow to query ANY 
> data within it at an incredible speed as well as ingesting data at a very 
> fast pace.
>
> However this data is going to grow until it is archived. As I see it, data 
> has two stages, HOT DATA when data is accessible to be queried on very low 
> latency and COLD DATA when data can be queried and processed but we can allow 
> a (relatively long) delay. Cassandra is VERY good with the HOT DATA but it is 
> not very cost effective when the COLD DATA starts to grow because each node 
> only stores a tiny amount (1TB recommended). The number of nodes needed start 
> to grow even if this data is rarely queried!!
>
> Has anyone implemented a solution that "archives" data into a cold(er) 
> storage outside of cassandra, while still being available for (offline) 
> processing with spark? For example into a separate cluster with Hadoop/HIVE?
> What is the standard in this cases?
>
> F Javier Pareja

Reply via email to