The following should handle the situation you encountered:
diff --git
a/streaming/src/main/scala/org/apache/spark/streaming/dstream/FileInputDStream.scala
b/streaming/src/main/scala/org/apache/spark/streaming/dstream/FileInputDStream.sca
index ed93058..f79420b 100644
---
a/streaming/src/main/scala/org/apache/spark/streaming/dstream/FileInputDStream.scala
+++
b/streaming/src/main/scala/org/apache/spark/streaming/dstream/FileInputDStream.scala
@@ -266,6 +266,10 @@ class FileInputDStream[K, V, F <: NewInputFormat[K,
V]](
logDebug(s"$pathStr already considered")
return false
}
+ if (pathStr.endsWith("._COPYING_")) {
+ logDebug(s"$pathStr is being copied")
+ return false
+ }
logDebug(s"$pathStr accepted with mod time $modTime")
return true
}
On Wed, May 18, 2016 at 2:06 AM, Yogesh Vyas <[email protected]> wrote:
> Hi,
> I am trying to read the files in a streaming way using Spark
> Streaming. For this I am copying files from my local folder to the
> source folder from where spark reads the file.
> After reading and printing some of the files, it gives the following error:
>
> Caused by:
> org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException):
> File does not exist: /user/hadoop/file17.xml._COPYING_
>
> I guess the Spark Streaming file is trying to read the file before it
> gets copied completely.
>
> Does anyone knows how to handle such type of exception?
>
> Regards,
> Yogesh
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>