You can listen to files in a specific directory using: Take a look at: http://spark.apache.org/docs/latest/streaming-programming-guide.html
streamingContext.fileStream On Thu, Sep 15, 2016 at 10:31 AM, Jörn Franke <jornfra...@gmail.com> wrote: > Hi, > I recommend that the third party application puts an empty file with the > same filename as the original file, but the extension ".uploaded". This is > an indicator that the file has been fully (!) written to the fs. Otherwise > you risk only reading parts of the file. > Then, you can have a file system listener for this .upload file. > > Spark streaming or Kafka are not needed/suitable, if the server is a file > server. You can use oozie (maybe with a simple custom action) to poll for > .uploaded files and transmit them. > > On 15 Sep 2016, at 19:00, Kappaganthu, Sivaram (ES) < > sivaram.kappagan...@adp.com> wrote: > > Hello, > > > > I am a newbie to spark and I have below requirement. > > > > Problem statement : A third party application is dumping files > continuously in a server. Typically the count of files is 100 files per > hour and each file is of size less than 50MB. My application has to > process those files. > > > > Here > > 1) is it possible for spark-stream to trigger a job after a file is > placed instead of triggering a job at fixed batch interval? > > 2) If it is not possible with Spark-streaming, can we control this with > Kafka/Flume > > > > Thanks, > > Sivaram > > > ------------------------------ > This message and any attachments are intended only for the use of the > addressee and may contain information that is privileged and confidential. > If the reader of the message is not the intended recipient or an authorized > representative of the intended recipient, you are hereby notified that any > dissemination of this communication is strictly prohibited. If you have > received this communication in error, notify the sender immediately by > return email and delete the message and any attachments from your system. > >