Re: Direct IO with Spark and Hadoop over Cassandra

2014-09-16 Thread platon.tema
will also have to read/resolve multiple row instances (if you update records) and tombstones (if you delete records) yourself. *From:*platon.tema [mailto:platon.t...@yandex.ru] *Sent:* Tuesday, September 16, 2014 1:51 PM *To:* user@cassandra.apache.org *Subject:* Re: Direct IO with Spark and

RE: Direct IO with Spark and Hadoop over Cassandra

2014-09-16 Thread moshe.kranc
You will also have to read/resolve multiple row instances (if you update records) and tombstones (if you delete records) yourself. From: platon.tema [mailto:platon.t...@yandex.ru] Sent: Tuesday, September 16, 2014 1:51 PM To: user@cassandra.apache.org Subject: Re: Direct IO with Spark and Hadoop

Re: Direct IO with Spark and Hadoop over Cassandra

2014-09-16 Thread platon.tema
Thanks. But 1) overcomes with C* API for commitlog and memtables or with mixed access (direct IO + traditional connectors or pure CQL if data model allows, we experimented with it). 2) is more complex for universal solution. In our case C* uses without replication (RF=1) because of huge data

Re: Direct IO with Spark and Hadoop over Cassandra

2014-09-16 Thread DuyHai Doan
If you access directly the C* sstables from those frameworks, you will: 1) miss live data which are in memory and not dumped yet to disk 2) skip the Dynamo layer of C* responsible for data consistency Le 16 sept. 2014 10:58, "platon.tema" a écrit : > Hi. > > As I see massive data processing too