So after pulling my hair out for a bit trying to convert one of my standard
spark jobs to streaming I found that FileInputDStream does not support
nested folders (see the brief mention here
http://spark.apache.org/docs/latest/streaming-programming-guide.html#basic-sources
the fileStream method returns a FileInputDStream).  So before, for my
standard job, I was reading from say

s3n://mybucket/2015/03/02/*log

And could also modify it to simply get an entire months worth of logs. 
Since the logs are split up based upon their date, when the batch ran for
the day, I simply passed in a parameter of the date to make sure I was
reading the correct data

But since I want to turn this job into a streaming job I need to simply do
something like

s3n://mybucket/*log

This would totally work fine if it were a standard spark application, but
fails for streaming.  Is there anyway I can get around this limitation?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-FileStream-Nested-File-Support-tp22370.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to