Join between DStream and Periodically-Changing-RDD

2015-06-09 Thread Ilove Data
Hi, I'm trying to join DStream with interval let say 20s, join with RDD loaded from HDFS folder which is changing periodically, let say new file is coming to the folder for every 10 minutes. How should it be done, considering the HDFS files in the folder is periodically changing/adding new files?

Re: Join between DStream and Periodically-Changing-RDD

2015-06-14 Thread Ilove Data
eam >> RDDs >> >> >> >> You can feed your HDFS file into a Message Broker topic and consume it >> from there in the form of DStream RDDs which you keep aggregating over the >> lifetime of the spark streaming app instance >> >> >>