Is cacheTable similar to asTempTable before?
Sent from my iPhone
> On 19 Jan, 2016, at 4:18 am, George Sigletos wrote:
>
> Thanks Kevin for your reply.
>
> I was suspecting the same thing as well, although it still does not make much
> sense to me why would you need to do both:
> myData.cach
Thanks Kevin for your reply.
I was suspecting the same thing as well, although it still does not make
much sense to me why would you need to do both:
myData.cache()
sqlContext.cacheTable("myData")
in case you are using both sqlContext and dataframes to execute queries
dataframe.select(...) and s
Hi George,
I believe that sqlContext.cacheTable("tableName") is to be used when you
want to cache the data that is being used within a Spark SQL query. For
example, take a look at the code below.
> val myData = sqlContext.load("com.databricks.spark.csv", Map("path" ->
> "hdfs://somepath/file", "
According to the documentation they are exactly the same, but in my queries
dataFrame.cache()
results in much faster execution times vs doing
sqlContext.cacheTable("tableName")
Is there any explanation about this? I am not caching the RDD prior to
creating the dataframe. Using Pyspark on Spark