As we know, in Spark, SparkContext provide the wholeTextFile() method to read
all files in the specific directory, then generate RDD(fileName, content):
scala> val lines = sc.wholeTextFiles("/Users/workspace/scala101/data")
14/08/14 22:43:02 INFO MemoryStore: ensureFreeSpace(35896) called with
curMem=0, maxMem=318111744
14/08/14 22:43:02 INFO MemoryStore: Block broadcast_0 stored as values in
memory (estimated size 35.1 KB, free 303.3 MB)
lines: org.apache.spark.rdd.RDD[(String, String)] =
/Users/workspace/scala101/data WholeTextFileRDD[0] at wholeTextFiles at
<console>:12
Does StreamContext provide the similar function to listen to the incoming files
on HDFS? So that I can handle different files by file name on Spark Streaming.
--
ZhangYi (张逸)
Developer
tel: 15023157626
blog: agiledon.github.com
weibo: tw张逸
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)