Hi Renyi, This is the intended behavior of the streaming HdfsWordCount example. It makes use of a 'textFileStream' which will monitor a hdfs directory for any newly created files and push them into a dstream. It is meant to be run indefinitely, unless interrupted by ctrl-c, for example.
-bryan On Nov 13, 2015 10:52 AM, "Renyi Xiong" <renyixio...@gmail.com> wrote: > Hi, > > I try to run the following 1.4.1 sample by putting a words.txt under > localdir > > bin\run-example org.apache.spark.examples.streaming.HdfsWordCount localdir > > 2 questions > > 1. it does not pick up words.txt because it's 'old' I guess - any option > to let it picked up? > 2. I managed to put a 'new' file on the fly which got picked up, but after > processing, the program doesn't stop (keeps generating empty RDDs instead), > any option to let it stop when no new files come in (otherwise it blocks > others when I want to run multiple samples?) > > Thanks, > Renyi. >