Hi, I am trying to extract the filenames from which a Dstream is generated by parsing the toDebugString method on RDD I am implementing the following code in spark-shell:
import org.apache.spark.streaming.{StreamingContext, Seconds} val ssc = new StreamingContext(sc,Seconds(10)) val lines = ssc.textFileStream(// directory //) def g : List[String] = { var res = List[String]() lines.foreachRDD{ rdd => { if(rdd.count > 0){ val files = rdd.toDebugString.split("\n").filter(_.contains(":\")) files.foreach{ ms => { res = ms.split(" ")(2)::res }} } }} res } g.foreach(x => {println(x); println("************")}) However when I run the code, nothing gets printed on the console apart from the logs. Am I doing something wrong? And is there any better way to extract the file names from DStream ? Thanks in advance Animesh