The following should handle the situation you encountered:

diff --git
a/streaming/src/main/scala/org/apache/spark/streaming/dstream/FileInputDStream.scala
b/streaming/src/main/scala/org/apache/spark/streaming/dstream/FileInputDStream.sca
index ed93058..f79420b 100644
---
a/streaming/src/main/scala/org/apache/spark/streaming/dstream/FileInputDStream.scala
+++
b/streaming/src/main/scala/org/apache/spark/streaming/dstream/FileInputDStream.scala
@@ -266,6 +266,10 @@ class FileInputDStream[K, V, F <: NewInputFormat[K,
V]](
       logDebug(s"$pathStr already considered")
       return false
     }
+    if (pathStr.endsWith("._COPYING_")) {
+      logDebug(s"$pathStr is being copied")
+      return false
+    }
     logDebug(s"$pathStr accepted with mod time $modTime")
     return true
   }

On Wed, May 18, 2016 at 2:06 AM, Yogesh Vyas <informy...@gmail.com> wrote:

> Hi,
> I am trying to read the files in a streaming way using Spark
> Streaming. For this I am copying files from my local folder to the
> source folder from where spark reads the file.
> After reading and printing some of the files, it gives the following error:
>
> Caused by:
> org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException):
> File does not exist: /user/hadoop/file17.xml._COPYING_
>
> I guess the Spark Streaming file is trying to read the file before it
> gets copied completely.
>
> Does anyone knows how to handle such type of exception?
>
> Regards,
> Yogesh
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Reply via email to