couple things to add here:
1) you can import the
org.apache.spark.streaming.dstream.PairDStreamFunctions implicit which adds
a whole ton of functionality to DStream itself. this lets you work at the
DStream level versus digging into the underlying RDDs.
2) you can use ssc.fileStream(directory) t
Thanks Sean! I got that working last night similar to how you solved it. Any
ideas about how to monitor that same folder in another script by creating a
stream? I can use sc.sequenceFile() to read in the RDD, but how do I get the
name of the file that got added since there is no sequenceFileStre
What about simply:
dstream.foreachRDD(_.saveAsSequenceFile(...))
?
On Tue, Jul 22, 2014 at 2:06 AM, Barnaby wrote:
> First of all, I do not know Scala, but learning.
>
> I'm doing a proof of concept by streaming content from a socket, counting
> the words and write it to a Tachyon disk. A diffe