Re: Direct IO with Spark and Hadoop over Cassandra

2014-09-16 Thread platon.tema
Hadoop over Cassandra Thanks. But 1) overcomes with C* API for commitlog and memtables or with mixed access (direct IO + traditional connectors or pure CQL if data model allows, we experimented with it). 2) is more complex for universal solution. In our case C* uses without replication (RF=1

RE: Direct IO with Spark and Hadoop over Cassandra

2014-09-16 Thread moshe.kranc
over Cassandra Thanks. But 1) overcomes with C* API for commitlog and memtables or with mixed access (direct IO + traditional connectors or pure CQL if data model allows, we experimented with it). 2) is more complex for universal solution. In our case C* uses without replication (RF=1

Re: Direct IO with Spark and Hadoop over Cassandra

2014-09-16 Thread platon.tema
Thanks. But 1) overcomes with C* API for commitlog and memtables or with mixed access (direct IO + traditional connectors or pure CQL if data model allows, we experimented with it). 2) is more complex for universal solution. In our case C* uses without replication (RF=1) because of huge data

Re: Direct IO with Spark and Hadoop over Cassandra

2014-09-16 Thread DuyHai Doan
If you access directly the C* sstables from those frameworks, you will: 1) miss live data which are in memory and not dumped yet to disk 2) skip the Dynamo layer of C* responsible for data consistency Le 16 sept. 2014 10:58, "platon.tema" a écrit : > Hi. > > As I see massive data processing too

Direct IO with Spark and Hadoop over Cassandra

2014-09-16 Thread platon.tema
Hi. As I see massive data processing tools (map\reduce) with C* data include connectors - Calliope http://tuplejump.github.io/calliope/ - Datastax spark cassandra connector https://github.com/datastax/spark-cassandra-connector - Startio Deep https://github.com/Stratio/stratio-deep - other free

Re: Timeout Errors while running Hadoop over Cassandra

2011-01-19 Thread Jairam Chandar
I was able to workaround this problem by modifying the ColumnFamilyRecordReader class from the org.apache.cassandra.hadoop package. Since the errors where TimeoutException, I added sleep and retry logic around rows = client.get_range_slices(keyspace, new ColumnParent(cfName), predicate,

Re: Timeout Errors while running Hadoop over Cassandra

2011-01-14 Thread Jairam Chandar
The cassandra logs strangely show no errors at the time of failure. Changing the RPCTimeoutInMillis seemed to help. Though it slowed down the job considerably, it seems to be finishing by changing the timeout value to 1 min. Unfortunately, I cannot be sure if it will continue to work if the data in

Re: Timeout Errors while running Hadoop over Cassandra

2011-01-13 Thread Jeremy Hanna
On Jan 12, 2011, at 12:40 PM, Jairam Chandar wrote: > Hi folks, > > We have a Cassandra 0.6.6 cluster running in production. We want to run > Hadoop (version 0.20.2) jobs over this cluster in order to generate reports. > I modified the word_count example in the contrib folder of the cassandra

Re: Timeout Errors while running Hadoop over Cassandra

2011-01-12 Thread mck
On Wed, 2011-01-12 at 23:04 +0100, mck wrote: > > Caused by: TimedOutException() > > What is the exception in the cassandra logs? Or tried increasing rpc_timeout_in_ms? ~mck -- "When there is no enemy within, the enemies outside can't hurt you." African proverb | www.semb.wever.org | www.sesa

Re: Timeout Errors while running Hadoop over Cassandra

2011-01-12 Thread mck
On Wed, 2011-01-12 at 18:40 +, Jairam Chandar wrote: > Caused by: TimedOutException() What is the exception in the cassandra logs? ~mck -- "Don't use Outlook. Outlook is really just a security hole with a small e-mail client attached to it." Brian Trosko | www.semb.wever.org | www.sesat.no

Re: Timeout Errors while running Hadoop over Cassandra

2011-01-12 Thread Aaron Morton
Whats happening in the cassandra server logs when you get these errors? Reading through the hadoop 0.6.6 code it looks like it creates a thrift client with an infinite timeout. So it may be an internode timeout, which is set in storage-conf.xml.AaronOn 13 Jan, 2011,at 07:40 AM, Jairam Chandar wrot

Timeout Errors while running Hadoop over Cassandra

2011-01-12 Thread Jairam Chandar
Hi folks, We have a Cassandra 0.6.6 cluster running in production. We want to run Hadoop (version 0.20.2) jobs over this cluster in order to generate reports. I modified the word_count example in the contrib folder of the cassandra distribution. While the program is running fine for small datasets

Re: Hadoop over Cassandra

2010-05-18 Thread Jonathan Ellis
On Tue, May 18, 2010 at 9:40 PM, Mark Schnitzius wrote: >> If anyone has "war stories" on the topic of Cassandra & Hadoop (or >> even just Hadoop in general) let me know. > > Don't know if it counts as a war story, but I was successful recently in > implementing something I got advice on in an ear

Re: Hadoop over Cassandra

2010-05-18 Thread Mark Schnitzius
> > If anyone has "war stories" on the topic of Cassandra & Hadoop (or > even just Hadoop in general) let me know. Don't know if it counts as a war story, but I was successful recently in implementing something I got advice on in an earlier thread, namely feeding both a Cassandra table and a Had

Re: Hadoop over Cassandra

2010-05-18 Thread Joseph Stein
xim Grinev" > Sent: Tuesday, May 18, 2010 2:42am > To: user@cassandra.apache.org > Subject: Re: Hadoop over Cassandra > > On Tue, May 18, 2010 at 2:23 AM, Jonathan Ellis wrote: > >> On Mon, May 17, 2010 at 4:12 PM, Vick Khera wrote: >> > On Mon,

Re: Hadoop over Cassandra

2010-05-18 Thread Stu Hood
: "Maxim Grinev" Sent: Tuesday, May 18, 2010 2:42am To: user@cassandra.apache.org Subject: Re: Hadoop over Cassandra On Tue, May 18, 2010 at 2:23 AM, Jonathan Ellis wrote: > On Mon, May 17, 2010 at 4:12 PM, Vick Khera wrote: > > On Mon, May 17, 2010 at 3:46 PM, Jonathan Ellis >

Re: Hadoop over Cassandra

2010-05-18 Thread Ben Browning
Maxim, Check out the getLocation() method from this file: http://svn.apache.org/repos/asf/cassandra/trunk/src/java/org/apache/cassandra/hadoop/ColumnFamilyRecordReader.java Basically, it loops over the list of nodes containing this split of data and if any of them are the local node, it returns

Re: Hadoop over Cassandra

2010-05-18 Thread Maxim Grinev
On Tue, May 18, 2010 at 2:23 AM, Jonathan Ellis wrote: > On Mon, May 17, 2010 at 4:12 PM, Vick Khera wrote: > > On Mon, May 17, 2010 at 3:46 PM, Jonathan Ellis > wrote: > >> Moving to the user@ list. > >> > >> http://wiki.apache.org/cassandra/HadoopSupport should be useful. > > > > That documen

Re: Hadoop over Cassandra

2010-05-17 Thread Jonathan Ellis
On Mon, May 17, 2010 at 4:12 PM, Vick Khera wrote: > On Mon, May 17, 2010 at 3:46 PM, Jonathan Ellis wrote: >> Moving to the user@ list. >> >> http://wiki.apache.org/cassandra/HadoopSupport should be useful. > > That document doesn't really answer the "is data locality preserved" > when running t

Re: Hadoop over Cassandra

2010-05-17 Thread Vick Khera
On Mon, May 17, 2010 at 3:46 PM, Jonathan Ellis wrote: > Moving to the user@ list. > > http://wiki.apache.org/cassandra/HadoopSupport should be useful. That document doesn't really answer the "is data locality preserved" when running the map phase, but my hunch is "no". > > On Mon, May 17, 2010

Re: Hadoop over Cassandra

2010-05-17 Thread Jonathan Ellis
Moving to the user@ list. http://wiki.apache.org/cassandra/HadoopSupport should be useful. On Mon, May 17, 2010 at 2:41 PM, Yan Virin wrote: > Hi, > Can someone explain how this works? As long as I know, there is no execution > engine in Cassandra alone, so I assume that Hadoop gives the MapRedu