So after pulling my hair out for a bit trying to convert one of my standard spark jobs to streaming I found that FileInputDStream does not support nested folders (see the brief mention here http://spark.apache.org/docs/latest/streaming-programming-guide.html#basic-sources the fileStream method returns a FileInputDStream). So before, for my standard job, I was reading from say
s3n://mybucket/2015/03/02/*log And could also modify it to simply get an entire months worth of logs. Since the logs are split up based upon their date, when the batch ran for the day, I simply passed in a parameter of the date to make sure I was reading the correct data But since I want to turn this job into a streaming job I need to simply do something like s3n://mybucket/*log This would totally work fine if it were a standard spark application, but fails for streaming. Is there anyway I can get around this limitation? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-FileStream-Nested-File-Support-tp22370.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org