Hi,

I understand that a well designed cassandra system will allow to query ANY
data within it at an incredible speed as well as ingesting data at a very
fast pace.

However this data is going to grow until it is archived. As I see it, data
has two stages, HOT DATA when data is accessible to be queried on very low
latency and COLD DATA when data can be queried and processed but we can
allow a (relatively long) delay. Cassandra is VERY good with the HOT DATA
but it is not very cost effective when the COLD DATA starts to grow because
each node only stores a tiny amount (1TB recommended). The number of nodes
needed start to grow even if this data is rarely queried!!

Has anyone implemented a solution that "archives" data into a cold(er)
storage outside of cassandra, while still being available for (offline)
processing with spark? For example into a separate cluster with Hadoop/HIVE?
What is the standard in this cases?

F Javier Pareja

Reply via email to