Why flume isn't an option here? Alonso Isidoro Roman [image: https://]about.me/alonso.isidoro.roman <https://about.me/alonso.isidoro.roman?promo=email_sig&utm_source=email_sig&utm_medium=email_sig&utm_campaign=external_links>
2016-10-05 11:14 GMT+02:00 Kappaganthu, Sivaram (ES) < sivaram.kappagan...@adp.com>: > Hi Franke, > > > > Thanks for your reply. I am trying this and doing as follows. > > > > Let the third party application 1) dumps the original file in a directory > and .upload file in another directory. > > I am writing logic to listen to directory that contains .upload files. > > > > Here I need to map the name of the file in both the directories. Could you > please suggest how to get the filename in streaming. > > > > val sc = new SparkContext("local[*]", "test") > > val ssc = new StreamingContext(sc, Seconds(4)) > > val dStream = ssc.textFileStream(pathOfDirToStream) > > dStream.foreachRDD { eventsRdd => */* How to get the file name */* } > > > > > > *From:* Jörn Franke [mailto:jornfra...@gmail.com] > *Sent:* Thursday, September 15, 2016 11:02 PM > *To:* Kappaganthu, Sivaram (ES) > *Cc:* user@spark.apache.org > *Subject:* Re: Spark Streaming-- for each new file in HDFS > > > > Hi, > > I recommend that the third party application puts an empty file with the > same filename as the original file, but the extension ".uploaded". This is > an indicator that the file has been fully (!) written to the fs. Otherwise > you risk only reading parts of the file. > > Then, you can have a file system listener for this .upload file. > > > > Spark streaming or Kafka are not needed/suitable, if the server is a file > server. You can use oozie (maybe with a simple custom action) to poll for > .uploaded files and transmit them. > > > On 15 Sep 2016, at 19:00, Kappaganthu, Sivaram (ES) < > sivaram.kappagan...@adp.com> wrote: > > Hello, > > > > I am a newbie to spark and I have below requirement. > > > > Problem statement : A third party application is dumping files > continuously in a server. Typically the count of files is 100 files per > hour and each file is of size less than 50MB. My application has to > process those files. > > > > Here > > 1) is it possible for spark-stream to trigger a job after a file is > placed instead of triggering a job at fixed batch interval? > > 2) If it is not possible with Spark-streaming, can we control this with > Kafka/Flume > > > > Thanks, > > Sivaram > > > ------------------------------ > > This message and any attachments are intended only for the use of the > addressee and may contain information that is privileged and confidential. > If the reader of the message is not the intended recipient or an authorized > representative of the intended recipient, you are hereby notified that any > dissemination of this communication is strictly prohibited. If you have > received this communication in error, notify the sender immediately by > return email and delete the message and any attachments from your system. > >