Re: Manipulating RDDs within a DStream

2014-10-31 Thread Helena Edelson
Hi Harold, This is a great use case, and here is how you could do it, for example, with Spark Streaming: Using a Kafka stream: https://github.com/killrweather/killrweather/blob/master/killrweather-app/src/main/scala/com/datastax/killrweather/KafkaStreamingActor.scala#L50 Save raw data to Cassand

Re: Manipulating RDDs within a DStream

2014-10-31 Thread Helena Edelson
Hi Harold, Yes, that is the problem :) Sorry for the confusion, I will make this clear in the docs ;) since master is work for the next version. All you need to do is use spark 1.1.0 as you have it already "com.datastax.spark" %% "spark-cassandra-connector" % "1.1.0-beta1” and assembly - not fr

Re: Manipulating RDDs within a DStream

2014-10-31 Thread Harold Nguyen
Thanks Lalit, and Helena, What I'd like to do is manipulate the values within a DStream like this: DStream.foreachRDD( rdd => { val arr = record.toArray } I'd then like to be able to insert results from the arr back into Cassadnra, after I've manipulated the arr array. However, for all

Re: Manipulating RDDs within a DStream

2014-10-31 Thread Harold Nguyen
Hi Helena, Thanks very much ! I'm using Spark 1.1.0, and spark-cassandra-connector-assembly-1.2.0-SNAPSHOT Best wishes, Harold On Fri, Oct 31, 2014 at 10:31 AM, Helena Edelson < helena.edel...@datastax.com> wrote: > Hi Harold, > Can you include the versions of spark and spark-cassandra-connect

Re: Manipulating RDDs within a DStream

2014-10-31 Thread Helena Edelson
Hi Harold, Can you include the versions of spark and spark-cassandra-connector you are using? Thanks! Helena @helenaedelson On Oct 30, 2014, at 12:58 PM, Harold Nguyen wrote: > Hi all, > > I'd like to be able to modify values in a DStream, and then send it off to an > external source like C

Re: Manipulating RDDs within a DStream

2014-10-30 Thread lalit1303
Hi, Since, the cassandra object is not serializable you can't open the connection on driver level and access the object inside foreachRDD (i.e. at worker level). You have to open connection inside foreachRDD only, perform the operation and then close the connection. For example: wordCounts.fore

Re: Manipulating RDDs within a DStream

2014-10-30 Thread Harold Nguyen
Hi, Sorry, there's a typo there: val arr = rdd.toArray Harold On Thu, Oct 30, 2014 at 9:58 AM, Harold Nguyen wrote: > Hi all, > > I'd like to be able to modify values in a DStream, and then send it off to > an external source like Cassandra, but I keep getting Serialization errors > and am n