Re: Direct IO with Spark and Hadoop over Cassandra

DuyHai Doan Tue, 16 Sep 2014 04:41:16 -0700

If you access directly the C* sstables from those frameworks, you will:

1) miss live data which are in memory and not dumped yet to disk


2) skip the Dynamo layer of C* responsible for data consistency
Le 16 sept. 2014 10:58, "platon.tema" <platon.t...@yandex.ru> a écrit :

> Hi.
>
> As I see massive data processing tools (map\reduce) with C* data include
>
> connectors
> - Calliope http://tuplejump.github.io/calliope/
> - Datastax spark cassandra connector https://github.com/datastax/
> spark-cassandra-connector
> - Startio Deep https://github.com/Stratio/stratio-deep
> - other free\commercial
>
> runtime (job management and infrastructure)
> - Spark
> - Hadoop
>
> But if I'm not mistaken all these solutions use network for data loading.
> In best case logic instance (some "job") run on the same node (wherethe
> corresponding range was found).
>
> Why this logic can`t use direct C* IO (sstable reading from disk)? Any
> cons ?
>
> Some time ago i read article (still can't find it) about academical
> research within Hadoop was modified to support this direct IO mode.
> According to that benchmarks direct IOgave a significant performance
> increase.
>

Re: Direct IO with Spark and Hadoop over Cassandra

Reply via email to