however, the cache is not guaranteed to remain, if other jobs are launched in the cluster and require more memory than what's left in the overall caching memory, previous RDDs will be discarded.
Using an off heap cache like tachyon as a dump repo can help. In general, I'd say that using a persistent sink (like Cassandra for instance) is best. my .2¢ aℕdy ℙetrella about.me/noootsab [image: aℕdy ℙetrella on about.me] <http://about.me/noootsab> On Sat, Sep 13, 2014 at 9:20 AM, Mayur Rustagi <mayur.rust...@gmail.com> wrote: > You can cache data in memory & query it using Spark Job Server. > Most folks dump data down to a queue/db for retrieval > You can batch up data & store into parquet partitions as well. & query it > using another SparkSQL shell, JDBC driver in SparkSQL is part 1.1 i > believe. > -- > Regards, > Mayur Rustagi > Ph: +1 (760) 203 3257 > http://www.sigmoidanalytics.com > @mayur_rustagi > > > On Fri, Sep 12, 2014 at 2:54 PM, Marius Soutier <mps....@gmail.com> wrote: > >> Hi there, >> >> I’m pretty new to Spark, and so far I’ve written my jobs the same way I >> wrote Scalding jobs - one-off, read data from HDFS, count words, write >> counts back to HDFS. >> >> Now I want to display these counts in a dashboard. Since Spark allows to >> cache RDDs in-memory and you have to explicitly terminate your app (and >> there’s even a new JDBC server in 1.1), I’m assuming it’s possible to keep >> an app running indefinitely and query an in-memory RDD from the outside >> (via SparkSQL for example). >> >> Is this how others are using Spark? Or are you just dumping job results >> into message queues or databases? >> >> >> Thanks >> - Marius >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> >> >