If you have enabled checkpointing the spark will handle that for you. Thanks Best Regards
On Thu, Aug 27, 2015 at 4:21 PM, Masf <masfwo...@gmail.com> wrote: > Thanks Akhil, I will have a look. > > I have a dude regarding to spark streaming and filestream. If spark > streaming crashs and while spark was down new files are created in input > folder, when spark streaming is launched again, how can I process these > files? > > Thanks. > Regards. > Miguel. > > > > On Thu, Aug 27, 2015 at 12:29 PM, Akhil Das <ak...@sigmoidanalytics.com> > wrote: > >> Have a look at the spark streaming. You can make use of the >> ssc.fileStream. >> >> Eg: >> >> val avroStream = ssc.fileStream[AvroKey[GenericRecord], NullWritable, >> AvroKeyInputFormat[GenericRecord]](input) >> >> You can also specify a filter function >> <http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.streaming.StreamingContext> >> as the second argument. >> >> Thanks >> Best Regards >> >> On Wed, Aug 19, 2015 at 10:46 PM, Masf <masfwo...@gmail.com> wrote: >> >>> Hi. >>> >>> I'd like to read Avro files using this library >>> https://github.com/databricks/spark-avro >>> >>> I need to load several files from a folder, not all files. Is there some >>> functionality to filter the files to load? >>> >>> And... Is is possible to know the name of the files loaded from a folder? >>> >>> My problem is that I have a folder where an external process is >>> inserting files every X minutes and I need process these files once, and I >>> can't move, rename or copy the source files. >>> >>> >>> Thanks >>> -- >>> >>> Regards >>> Miguel Ángel >>> >> >> > > > -- > > > Saludos. > Miguel Ángel >