There is no mechanism for keeping an RDD up to date with a changing source. 
However you could set up a steam that watches for changes to the directory and 
processes the new files or use the Hive integration in SparkSQL to run Hive 
queries directly. (However, old query results will still grow stale. )

Sent from my rotary phone. 


> On May 31, 2015, at 7:11 AM, Ashish Mukherjee <[email protected]> 
> wrote:
> 
> Hello,
> 
> Since RDDs are created from data from Hive tables or HDFS, how do we ensure 
> they are invalidated when the source data is updated?
> 
> Regards,
> Ashish

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to