The following should handle the situation you encountered: diff --git a/streaming/src/main/scala/org/apache/spark/streaming/dstream/FileInputDStream.scala b/streaming/src/main/scala/org/apache/spark/streaming/dstream/FileInputDStream.sca index ed93058..f79420b 100644 --- a/streaming/src/main/scala/org/apache/spark/streaming/dstream/FileInputDStream.scala +++ b/streaming/src/main/scala/org/apache/spark/streaming/dstream/FileInputDStream.scala @@ -266,6 +266,10 @@ class FileInputDStream[K, V, F <: NewInputFormat[K, V]]( logDebug(s"$pathStr already considered") return false } + if (pathStr.endsWith("._COPYING_")) { + logDebug(s"$pathStr is being copied") + return false + } logDebug(s"$pathStr accepted with mod time $modTime") return true }
On Wed, May 18, 2016 at 2:06 AM, Yogesh Vyas <informy...@gmail.com> wrote: > Hi, > I am trying to read the files in a streaming way using Spark > Streaming. For this I am copying files from my local folder to the > source folder from where spark reads the file. > After reading and printing some of the files, it gives the following error: > > Caused by: > org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): > File does not exist: /user/hadoop/file17.xml._COPYING_ > > I guess the Spark Streaming file is trying to read the file before it > gets copied completely. > > Does anyone knows how to handle such type of exception? > > Regards, > Yogesh > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >