Direct IO with Spark and Hadoop over Cassandra

platon.tema Tue, 16 Sep 2014 01:59:21 -0700

Hi.

As I see massive data processing tools (map\reduce) with C* data include


connectors
- Calliope http://tuplejump.github.io/calliope/

- Datastax spark cassandra connectorhttps://github.com/datastax/spark-cassandra-connector

- Startio Deep https://github.com/Stratio/stratio-deep
- other free\commercial

runtime (job management and infrastructure)
- Spark
- Hadoop

But if I'm not mistaken all these solutions use network for dataloading. In best case logic instance (some "job") run on the same node(wherethe corresponding range was found).

Why this logic can`t use direct C* IO (sstable reading from disk)? Anycons ?

Some time ago i read article (still can't find it) about academicalresearch within Hadoop was modified to support this direct IO mode.According to that benchmarks direct IOgave a significant performanceincrease.

Direct IO with Spark and Hadoop over Cassandra

Reply via email to