The file stream does not use receiver. May be that was not clear in the programming guide. I am updating it for 1.3 release right now, I will make it more clear. And file stream has full reliability. Read this in the programming guide. http://spark.apache.org/docs/latest/streaming-programming-guide.html#semantics-with-files-as-input-source
On Wed, Mar 4, 2015 at 2:14 AM, Emre Sevinc <emre.sev...@gmail.com> wrote: > Is FileInputDStream returned by fileStream method a reliable receiver? > > In the Spark Streaming Guide it says: > > "There can be two kinds of data sources based on their *reliability*. > Sources (like Kafka and Flume) allow the transferred data to be > acknowledged. If the system receiving data from these *reliable* sources > acknowledge the received data correctly, it can be ensured that no data > gets lost due to any kind of failure. This leads to two kinds of receivers. > > 1. *Reliable Receiver* - A *reliable receiver* correctly acknowledges > a reliable source that the data has been received and stored in Spark with > replication. > 2. *Unreliable Receiver* - These are receivers for sources that do not > support acknowledging. Even for reliable sources, one may implement an > unreliable receiver that do not go into the complexity of acknowledging > correctly." > > > So I wonder whether the receivers for HDFS (and local file system) are > reliable, e.g. when I'm using fileStream method to process files in a > directory locally or on HDFS? > > > -- > Emre Sevinç >