What about simply: dstream.foreachRDD(_.saveAsSequenceFile(...))
? On Tue, Jul 22, 2014 at 2:06 AM, Barnaby <bfa...@outlook.com> wrote: > First of all, I do not know Scala, but learning. > > I'm doing a proof of concept by streaming content from a socket, counting > the words and write it to a Tachyon disk. A different script will read the > file stream and print out the results. > > val lines = ssc.socketTextStream(args(0), args(1).toInt, > StorageLevel.MEMORY_AND_DISK_SER) > val words = lines.flatMap(_.split(" ")) > val wordCounts = words.map(x => (x, 1)).reduceByKey(_ + _) > wordCounts.saveAs???Files("tachyon://localhost:19998/files/WordCounts") > ssc.start() > ssc.awaitTermination() > > I already did a proof of concept to write and read sequence files but there > doesn't seem to be a saveAsSequenceFiles() method in DStream. What is the > best way to write out an RDD to a stream so that the timestamps are in the > filenames and so there is minimal overhead in reading the data back in as > "objects", see below. > > My simple successful proof was the following: > val rdd = sc.parallelize(Array(("a",2), ("b",3), ("c",1))) > rdd.saveAsSequenceFile("tachyon://.../123.sf2") > val rdd2 = sc.sequenceFile[String,Int]("tachyon://.../123.sf2") > > How can I do something similar with streaming? > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/saveAsSequenceFile-for-DStream-tp10369.html > Sent from the Apache Spark User List mailing list archive at Nabble.com.