Re: saveAsSequenceFile for DStream

2014-08-30 Thread Chris Fregly
couple things to add here: 1) you can import the org.apache.spark.streaming.dstream.PairDStreamFunctions implicit which adds a whole ton of functionality to DStream itself. this lets you work at the DStream level versus digging into the underlying RDDs. 2) you can use ssc.fileStream(directory) t

Re: saveAsSequenceFile for DStream

2014-07-22 Thread Barnaby Falls
Thanks Sean! I got that working last night similar to how you solved it. Any ideas about how to monitor that same folder in another script by creating a stream? I can use sc.sequenceFile() to read in the RDD, but how do I get the name of the file that got added since there is no sequenceFileStre

Re: saveAsSequenceFile for DStream

2014-07-22 Thread Sean Owen
What about simply: dstream.foreachRDD(_.saveAsSequenceFile(...)) ? On Tue, Jul 22, 2014 at 2:06 AM, Barnaby wrote: > First of all, I do not know Scala, but learning. > > I'm doing a proof of concept by streaming content from a socket, counting > the words and write it to a Tachyon disk. A diffe