On Thu, Jun 12, 2014 at 3:03 PM, Christopher Nguyen <c...@adatao.com> wrote:
> Toby, #saveAsTextFile() and #saveAsObjectFile() are probably what you want > for your use case. > Yes. Thankyou. I'm about to see if they exist for Python. > As for Parquet support, that's newly arrived in Spark 1.0.0 together with > SparkSQL so continue to watch this space. > Okay. > Gerard's suggestion to look at JobServer, which you can generalize as > "building a long-running application which allows multiple clients to > load/share/persist/save/collaborate-on RDDs" satisfies a larger, more > complex use case. That is indeed the job of a higher-level application, > subject to a wide variety of higher-level design choices. A number of us > have successfully built Spark-based apps around that model. > To my eyes, where I'm new to Spark, it seems like a sledgehammer being used to crack a nut. If RDDs persisted across jobs (a seemingly tiny change), I wouldn't need JobServer (a whole new application). There's a ton of functionality in JobServer which as yet I think I have no use for, except for that one feature, of persisting RDDs across jobs.