Re: can HDFS be a streaming source like Kafka in Spark 2.2.0?

ayan guha Mon, 15 Jan 2018 21:11:21 -0800

http://spark.apache.org/docs/1.0.0/streaming-programming-guide.html#input-sources



On Tue, Jan 16, 2018 at 3:50 PM, kant kodali <kanth...@gmail.com> wrote:

> Got it! What about overwriting the same file instead of appending?
>
> On Mon, Jan 15, 2018 at 7:47 PM, Gourav Sengupta <
> gourav.sengu...@gmail.com> wrote:
>
>> What Gerard means is that if you are adding new files in to the same base
>> path (key) then its fine, but in case you are appending lines to the same
>> file then changes will not be picked up.
>>
>> Regards,
>> Gourav Sengupta
>>
>> On Tue, Jan 16, 2018 at 12:20 AM, kant kodali <kanth...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I am not sure I understand. any examples ?
>>>
>>> On Mon, Jan 15, 2018 at 3:45 PM, Gerard Maas <gerard.m...@gmail.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> You can monitor a filesystem directory as streaming source as long as
>>>> the files placed there are atomically copied/moved into the directory.
>>>> Updating the files is not supported.
>>>>
>>>> kr, Gerard.
>>>>
>>>> On Mon, Jan 15, 2018 at 11:41 PM, kant kodali <kanth...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi All,
>>>>>
>>>>> I am wondering if HDFS can be a streaming source like Kafka in Spark
>>>>> 2.2.0? For example can I have stream1 reading from Kafka and writing to
>>>>> HDFS and stream2 to read from HDFS and write it back to Kakfa ? such that
>>>>> stream2 will be pulling the latest updates written by stream1.
>>>>>
>>>>> Thanks!
>>>>>
>>>>
>>>>
>>>
>>
>


-- 
Best Regards,
Ayan Guha

Re: can HDFS be a streaming source like Kafka in Spark 2.2.0?

Reply via email to