Check out StreamingContext.queueStream (
http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.streaming.StreamingContext
)

dean

Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
<http://shop.oreilly.com/product/0636920033073.do> (O'Reilly)
Typesafe <http://typesafe.com>
@deanwampler <http://twitter.com/deanwampler>
http://polyglotprogramming.com

On Mon, Oct 26, 2015 at 11:16 AM, Anfernee Xu <[email protected]> wrote:

> Hi,
>
> Here's my situation, I have some kind of offline dataset and got them
> loaded them into Spark as RDD, but I want to form a virtual data stream
> feeding to Spark Streaming, my code looks like this
>
>
>    // sort offline data by time, the dataset spans 2 hours
>  1)  JavaRDD sortedByTime = offlineDataRDD.sortBy( );
>
>    // compute a list of JavaRDD,  each element JavaRDD is hosting the data
> in the same time
>    // bucket, for example 5 minutes
>   2) List<JavaRDD> virtualStreamRdd = ?
>
>     Queue<JavaRDD<Row>> queue = Queues.newLinkedBlockingQueue();
>     queue.addAll(virtualStreamRdd);
>
>     /*
>      * Create DStream from the queue
>      */
>
>     3) final JavaDStream<Row> rowDStream =
> streamingContext.queueStream(queue);
>
>
> Currently I'm stucking in 2), any suggestion is appreciated.
>
> Thanks
>
> --
> --Anfernee
>

Reply via email to