According to the documentation they are exactly the same, but in my queries
dataFrame.cache() results in much faster execution times vs doing sqlContext.cacheTable("tableName") Is there any explanation about this? I am not caching the RDD prior to creating the dataframe. Using Pyspark on Spark 1.5.2 Kind regards, George