If you have enabled checkpointing the spark will handle that for you.

Thanks
Best Regards

On Thu, Aug 27, 2015 at 4:21 PM, Masf <masfwo...@gmail.com> wrote:

> Thanks Akhil, I will have a look.
>
> I have a dude regarding to spark streaming and filestream. If spark
> streaming crashs and while spark was down new files are created in input
> folder, when spark streaming is launched again, how can I process these
> files?
>
> Thanks.
> Regards.
> Miguel.
>
>
>
> On Thu, Aug 27, 2015 at 12:29 PM, Akhil Das <ak...@sigmoidanalytics.com>
> wrote:
>
>> Have a look at the spark streaming. You can make use of the
>> ssc.fileStream.
>>
>> Eg:
>>
>> val avroStream = ssc.fileStream[AvroKey[GenericRecord], NullWritable,
>>       AvroKeyInputFormat[GenericRecord]](input)
>>
>> You can also specify a filter function
>> <http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.streaming.StreamingContext>
>> as the second argument.
>>
>> Thanks
>> Best Regards
>>
>> On Wed, Aug 19, 2015 at 10:46 PM, Masf <masfwo...@gmail.com> wrote:
>>
>>> Hi.
>>>
>>> I'd like to read Avro files using this library
>>> https://github.com/databricks/spark-avro
>>>
>>> I need to load several files from a folder, not all files. Is there some
>>> functionality to filter the files to load?
>>>
>>> And... Is is possible to know the name of the files loaded from a folder?
>>>
>>> My problem is that I have a folder where an external process is
>>> inserting files every X minutes and I need process these files once, and I
>>> can't move, rename or copy the source files.
>>>
>>>
>>> Thanks
>>> --
>>>
>>> Regards
>>> Miguel Ángel
>>>
>>
>>
>
>
> --
>
>
> Saludos.
> Miguel Ángel
>

Reply via email to