Hi, I understand that a well designed cassandra system will allow to query ANY data within it at an incredible speed as well as ingesting data at a very fast pace.
However this data is going to grow until it is archived. As I see it, data has two stages, HOT DATA when data is accessible to be queried on very low latency and COLD DATA when data can be queried and processed but we can allow a (relatively long) delay. Cassandra is VERY good with the HOT DATA but it is not very cost effective when the COLD DATA starts to grow because each node only stores a tiny amount (1TB recommended). The number of nodes needed start to grow even if this data is rarely queried!! Has anyone implemented a solution that "archives" data into a cold(er) storage outside of cassandra, while still being available for (offline) processing with spark? For example into a separate cluster with Hadoop/HIVE? What is the standard in this cases? F Javier Pareja