If you access directly the C* sstables from those frameworks, you will: 1) miss live data which are in memory and not dumped yet to disk
2) skip the Dynamo layer of C* responsible for data consistency Le 16 sept. 2014 10:58, "platon.tema" <platon.t...@yandex.ru> a écrit : > Hi. > > As I see massive data processing tools (map\reduce) with C* data include > > connectors > - Calliope http://tuplejump.github.io/calliope/ > - Datastax spark cassandra connector https://github.com/datastax/ > spark-cassandra-connector > - Startio Deep https://github.com/Stratio/stratio-deep > - other free\commercial > > runtime (job management and infrastructure) > - Spark > - Hadoop > > But if I'm not mistaken all these solutions use network for data loading. > In best case logic instance (some "job") run on the same node (wherethe > corresponding range was found). > > Why this logic can`t use direct C* IO (sstable reading from disk)? Any > cons ? > > Some time ago i read article (still can't find it) about academical > research within Hadoop was modified to support this direct IO mode. > According to that benchmarks direct IOgave a significant performance > increase. >