http://spark.apache.org/docs/1.0.0/streaming-programming-guide.html#input-sources
On Tue, Jan 16, 2018 at 3:50 PM, kant kodali <kanth...@gmail.com> wrote: > Got it! What about overwriting the same file instead of appending? > > On Mon, Jan 15, 2018 at 7:47 PM, Gourav Sengupta < > gourav.sengu...@gmail.com> wrote: > >> What Gerard means is that if you are adding new files in to the same base >> path (key) then its fine, but in case you are appending lines to the same >> file then changes will not be picked up. >> >> Regards, >> Gourav Sengupta >> >> On Tue, Jan 16, 2018 at 12:20 AM, kant kodali <kanth...@gmail.com> wrote: >> >>> Hi, >>> >>> I am not sure I understand. any examples ? >>> >>> On Mon, Jan 15, 2018 at 3:45 PM, Gerard Maas <gerard.m...@gmail.com> >>> wrote: >>> >>>> Hi, >>>> >>>> You can monitor a filesystem directory as streaming source as long as >>>> the files placed there are atomically copied/moved into the directory. >>>> Updating the files is not supported. >>>> >>>> kr, Gerard. >>>> >>>> On Mon, Jan 15, 2018 at 11:41 PM, kant kodali <kanth...@gmail.com> >>>> wrote: >>>> >>>>> Hi All, >>>>> >>>>> I am wondering if HDFS can be a streaming source like Kafka in Spark >>>>> 2.2.0? For example can I have stream1 reading from Kafka and writing to >>>>> HDFS and stream2 to read from HDFS and write it back to Kakfa ? such that >>>>> stream2 will be pulling the latest updates written by stream1. >>>>> >>>>> Thanks! >>>>> >>>> >>>> >>> >> > -- Best Regards, Ayan Guha