Hi.

As I see massive data processing tools (map\reduce) with C* data include

connectors
- Calliope http://tuplejump.github.io/calliope/
- Datastax spark cassandra connector https://github.com/datastax/spark-cassandra-connector
- Startio Deep https://github.com/Stratio/stratio-deep
- other free\commercial

runtime (job management and infrastructure)
- Spark
- Hadoop

But if I'm not mistaken all these solutions use network for data loading. In best case logic instance (some "job") run on the same node (wherethe corresponding range was found).

Why this logic can`t use direct C* IO (sstable reading from disk)? Any cons ?

Some time ago i read article (still can't find it) about academical research within Hadoop was modified to support this direct IO mode. According to that benchmarks direct IOgave a significant performance increase.

Reply via email to