Re: Manipulating RDDs within a DStream

Helena Edelson Fri, 31 Oct 2014 11:00:01 -0700

Hi Harold,
This is a great use case, and here is how you could do it, for example, with 
Spark Streaming:


Using a Kafka stream:
https://github.com/killrweather/killrweather/blob/master/killrweather-app/src/main/scala/com/datastax/killrweather/KafkaStreamingActor.scala#L50

Save raw data to Cassandra from that stream
https://github.com/killrweather/killrweather/blob/master/killrweather-app/src/main/scala/com/datastax/killrweather/KafkaStreamingActor.scala#L56

Do n-computations on that streaming data: reading from Kafka, computing in 
Spark, and writing to Cassandra 
https://github.com/killrweather/killrweather/blob/master/killrweather-app/src/main/scala/com/datastax/killrweather/KafkaStreamingActor.scala#L69-L71

I hope that helps, and if not I’ll dig up another.

- Helena
@helenaedelson

On Oct 31, 2014, at 1:37 PM, Harold Nguyen <[email protected]> wrote:

> Thanks Lalit, and Helena,
> 
> What I'd like to do is manipulate the values within a DStream like this:
> 
> DStream.foreachRDD( rdd => {
> 
>        val arr = record.toArray
>      
> }
> 
> I'd then like to be able to insert results from the arr back into Cassadnra, 
> after I've manipulated the arr array.
> However, for all the examples I've seen, inserting into Cassandra is 
> something like:
> 
> val collection = sc.parralellize(Seq("foo", bar")))
> 
> Where "foo" and "bar" could be elements in the arr array. So I would like to 
> know how to insert into Cassandra at the worker level.
> 
> Best wishes,
> 
> Harold
> 
> On Thu, Oct 30, 2014 at 11:48 PM, lalit1303 <[email protected]> 
> wrote:
> Hi,
> 
> Since, the cassandra object is not serializable you can't open the
> connection on driver level and access the object inside foreachRDD (i.e. at
> worker level).
> You have to open connection inside foreachRDD only, perform the operation
> and then close the connection.
> 
> For example:
> 
>  wordCounts.foreachRDD( rdd => {
> 
>        val arr = rdd.toArray
> 
>        OPEN cassandra connection
>        store arr
>        CLOSE cassandra connection
> 
> })
> 
> 
> Thanks
> 
> 
> 
> -----
> Lalit Yadav
> [email protected]
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/Manipulating-RDDs-within-a-DStream-tp17740p17800.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
> 
>

Re: Manipulating RDDs within a DStream

Reply via email to