First of all, I do not know Scala, but learning.

I'm doing a proof of concept by streaming content from a socket, counting
the words and write it to a Tachyon disk. A different script will read the
file stream and print out the results.

 val lines = ssc.socketTextStream(args(0), args(1).toInt,
StorageLevel.MEMORY_AND_DISK_SER)
 val words = lines.flatMap(_.split(" "))
 val wordCounts = words.map(x => (x, 1)).reduceByKey(_ + _)
 wordCounts.saveAs???Files("tachyon://localhost:19998/files/WordCounts")
 ssc.start()
 ssc.awaitTermination()

I already did a proof of concept to write and read sequence files but there
doesn't seem to be a saveAsSequenceFiles() method in DStream. What is the
best way to write out an RDD to a stream so that the timestamps are in the
filenames and so there is minimal overhead in reading the data back in as
"objects", see below.

My simple successful proof was the following:
val rdd =  sc.parallelize(Array(("a",2), ("b",3), ("c",1)))
rdd.saveAsSequenceFile("tachyon://.../123.sf2")
val rdd2 = sc.sequenceFile[String,Int]("tachyon://.../123.sf2")

How can I do something similar with streaming?




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/saveAsSequenceFile-for-DStream-tp10369.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply via email to