Hi Harold, This is a great use case, and here is how you could do it, for example, with Spark Streaming:
Using a Kafka stream: https://github.com/killrweather/killrweather/blob/master/killrweather-app/src/main/scala/com/datastax/killrweather/KafkaStreamingActor.scala#L50 Save raw data to Cassandra from that stream https://github.com/killrweather/killrweather/blob/master/killrweather-app/src/main/scala/com/datastax/killrweather/KafkaStreamingActor.scala#L56 Do n-computations on that streaming data: reading from Kafka, computing in Spark, and writing to Cassandra https://github.com/killrweather/killrweather/blob/master/killrweather-app/src/main/scala/com/datastax/killrweather/KafkaStreamingActor.scala#L69-L71 I hope that helps, and if not I’ll dig up another. - Helena @helenaedelson On Oct 31, 2014, at 1:37 PM, Harold Nguyen <[email protected]> wrote: > Thanks Lalit, and Helena, > > What I'd like to do is manipulate the values within a DStream like this: > > DStream.foreachRDD( rdd => { > > val arr = record.toArray > > } > > I'd then like to be able to insert results from the arr back into Cassadnra, > after I've manipulated the arr array. > However, for all the examples I've seen, inserting into Cassandra is > something like: > > val collection = sc.parralellize(Seq("foo", bar"))) > > Where "foo" and "bar" could be elements in the arr array. So I would like to > know how to insert into Cassandra at the worker level. > > Best wishes, > > Harold > > On Thu, Oct 30, 2014 at 11:48 PM, lalit1303 <[email protected]> > wrote: > Hi, > > Since, the cassandra object is not serializable you can't open the > connection on driver level and access the object inside foreachRDD (i.e. at > worker level). > You have to open connection inside foreachRDD only, perform the operation > and then close the connection. > > For example: > > wordCounts.foreachRDD( rdd => { > > val arr = rdd.toArray > > OPEN cassandra connection > store arr > CLOSE cassandra connection > > }) > > > Thanks > > > > ----- > Lalit Yadav > [email protected] > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Manipulating-RDDs-within-a-DStream-tp17740p17800.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > >
