Are you not just looking for the foreachRDD() method on DStream? http://spark.apache.org/docs/0.9.1/streaming-programming-guide.html#output-operations
It gives you an RDD that you can do what you want with, including collect() it. On Thu, May 15, 2014 at 5:33 AM, Stephen Boesch <java...@gmail.com> wrote: > Looking further it appears the functionality I am seeking is in the > following private[spark] class ForEachdStream > > (version 0.8.1 , yes we are presently using an older release..) > > private[streaming] > class ForEachDStream[T: ClassManifest] ( > parent: DStream[T], > foreachFunc: (RDD[T], Time) => Unit > ) extends DStream[Unit](parent.ssc) { > > I would like to have access to this structure - particularly the ability to > define an "foreachFunc" that gets applied to each RDD within the DStream. > Is there a means to do so? > > > > 2014-05-14 21:25 GMT-07:00 Stephen Boesch <java...@gmail.com>: >> >> >> Given that collect() does not exist on DStream apparently my mental model >> of Streaming RDD (DStream) needs correction/refinement. So what is the >> means to convert DStream data into a JVM in-memory representation. All of >> the methods on DStream i.e. filter, map, transform, reduce, etc generate >> other DStream's, and not an in memory data structure. >> >> >> >