Re: Questions about caching

2019-01-01 Thread Gourav Sengupta
Hi Andrew, If you use Spark UI then all your questions are already answered there let me know if you need any help to browse the UI to look at the contents that are cached. Regards, Gourav On Tue, 11 Dec 2018, 17:13 Andrew Melo Greetings, Spark Aficionados- > > I'm working on a project to (ab-)

Re: Questions about caching

2018-12-24 Thread Bin Fan
Hi Andrew, Since you mentioned the alternative solution with Alluxio , here is a more comprehensive tutorial on caching Spark dataframes on Alluxio: https://www.alluxio.com/blog/effective-spark-dataframes-with-alluxio Namely, caching your dataframe is simply running df.write.p

Re: Questions about caching

2018-12-18 Thread Reza Safi
Hi Andrew, 1) df2 will cache all the columns 2) In spark2 you will receive a warning like: WARN execution.CacheManager: Asked to cache already cached data. I don't recall whether it is the same in 1.6. Seems you are not using spark 2. 2a) Not sure whether you are suggesting for a feature in Spark.

Questions about caching

2018-12-11 Thread Andrew Melo
Greetings, Spark Aficionados- I'm working on a project to (ab-)use PySpark to do particle physics analysis, which involves iterating with a lot of transformations (to apply weights and select candidate events) and reductions (to produce histograms of relevant physics objects). We have a basic vers