Using Spark Streaming to listen to HDFS directory and handle different files by file name

ZhangYi Thu, 14 Aug 2014 07:54:50 -0700

As we know, in Spark, SparkContext provide the wholeTextFile() method to read 
all files in the specific directory, then generate RDD(fileName, content):  
scala> val lines = sc.wholeTextFiles("/Users/workspace/scala101/data")
14/08/14 22:43:02 INFO MemoryStore: ensureFreeSpace(35896) called with 
curMem=0, maxMem=318111744
14/08/14 22:43:02 INFO MemoryStore: Block broadcast_0 stored as values in 
memory (estimated size 35.1 KB, free 303.3 MB)
lines: org.apache.spark.rdd.RDD[(String, String)] = 
/Users/workspace/scala101/data WholeTextFileRDD[0] at wholeTextFiles at 
<console>:12



Does StreamContext provide the similar function to listen to the incoming files 
on HDFS? So that I can handle different files by file name on Spark Streaming.  
 

--  
ZhangYi (张逸)
Developer
tel: 15023157626
blog: agiledon.github.com
weibo: tw张逸
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)

Using Spark Streaming to listen to HDFS directory and handle different files by file name

Reply via email to