Hi,

I am trying to extract the filenames from which a Dstream is generated by
parsing the toDebugString method on RDD
I am implementing the following code in spark-shell:

import org.apache.spark.streaming.{StreamingContext, Seconds}
val ssc = new StreamingContext(sc,Seconds(10))
val lines = ssc.textFileStream(// directory //)

def g : List[String] = {
   var res = List[String]()
   lines.foreachRDD{ rdd => {
      if(rdd.count > 0){
      val files = rdd.toDebugString.split("\n").filter(_.contains(":\"))
      files.foreach{ ms => {
         res = ms.split(" ")(2)::res
      }}   }
   }}
   res
}

g.foreach(x => {println(x); println("************")})

However when I run the code, nothing gets printed on the console apart from
the logs. Am I doing something wrong?
And is there any better way to extract the file names from DStream ?

Thanks in advance


Animesh

Reply via email to