Hi.
As I see massive data processing tools (map\reduce) with C* data include
connectors
- Calliope http://tuplejump.github.io/calliope/
- Datastax spark cassandra connector
https://github.com/datastax/spark-cassandra-connector
- Startio Deep https://github.com/Stratio/stratio-deep
- other free\commercial
runtime (job management and infrastructure)
- Spark
- Hadoop
But if I'm not mistaken all these solutions use network for data
loading. In best case logic instance (some "job") run on the same node
(wherethe corresponding range was found).
Why this logic can`t use direct C* IO (sstable reading from disk)? Any
cons ?
Some time ago i read article (still can't find it) about academical
research within Hadoop was modified to support this direct IO mode.
According to that benchmarks direct IOgave a significant performance
increase.