Re: sqlContext.cacheTable("tableName") vs dataFrame.cache()

2016-01-19 Thread Jerry Lam
Is cacheTable similar to asTempTable before? Sent from my iPhone > On 19 Jan, 2016, at 4:18 am, George Sigletos wrote: > > Thanks Kevin for your reply. > > I was suspecting the same thing as well, although it still does not make much > sense to me why would you need to do both: > myData.cach

Re: sqlContext.cacheTable("tableName") vs dataFrame.cache()

2016-01-19 Thread George Sigletos
Thanks Kevin for your reply. I was suspecting the same thing as well, although it still does not make much sense to me why would you need to do both: myData.cache() sqlContext.cacheTable("myData") in case you are using both sqlContext and dataframes to execute queries dataframe.select(...) and s

Re: sqlContext.cacheTable("tableName") vs dataFrame.cache()

2016-01-15 Thread Kevin Mellott
Hi George, I believe that sqlContext.cacheTable("tableName") is to be used when you want to cache the data that is being used within a Spark SQL query. For example, take a look at the code below. > val myData = sqlContext.load("com.databricks.spark.csv", Map("path" -> > "hdfs://somepath/file", "

sqlContext.cacheTable("tableName") vs dataFrame.cache()

2016-01-15 Thread George Sigletos
According to the documentation they are exactly the same, but in my queries dataFrame.cache() results in much faster execution times vs doing sqlContext.cacheTable("tableName") Is there any explanation about this? I am not caching the RDD prior to creating the dataframe. Using Pyspark on Spark