So I looked at the function, my only worry is that the cache should be cleared if I'm overwriting the cache with the same table name. I did this experiment and the cache shows as table not cached but want to confirm this. In addition to not using the old table values is it actually removed/overwritten in memory?
scala> df.collect res54: Array[org.apache.spark.sql.Row] = Array([blue,#0033FF], [red,#FF0000], [green,#FSKA]) <=== 3 rows scala> df2.collect res55: Array[org.apache.spark.sql.Row] = Array([blue,#0033FF], [red,#FF0000]) <=== 2 rows scala> df.registerTempTable("myColorsTable") scala> sqlContext.isCached("myColorsTable") res58: Boolean = false scala> sqlContext.cacheTable("myColorsTable") <=== cache table in df(3 rows) scala> sqlContext.isCached("myColorsTable") res60: Boolean = true scala> sqlContext.sql("select * from myColorsTable").foreach(println) <=== sql is running on df(3 rows) [blue,#0033FF] [red,#FF0000] [green,#FSKA] scala> df2.registerTempTable("myColorsTable") <=== register another table with the same table name *scala> sqlContext.isCached("myColorsTable") res63: Boolean = false* scala> sqlContext.sql("select * from myColorsTable").foreach(println) <=== sql is running on df2(2 rows) [blue,#0033FF] [red,#FF0000] On Fri, Dec 18, 2015 at 11:17 PM, Ted Yu <yuzhih...@gmail.com> wrote: > This method in CacheManager: > private[sql] def lookupCachedData(plan: LogicalPlan): Option[CachedData] > = readLock { > cachedData.find(cd => plan.sameResult(cd.plan)) > > Ied me to the following in > sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala > : > > def sameResult(plan: LogicalPlan): Boolean = { > > There is detailed comment above this method which should give some idea. > > Cheers > > On Fri, Dec 18, 2015 at 9:21 AM, Sahil Sareen <sareen...@gmail.com> wrote: > >> Thanks Ted! >> >> Yes, The schema might be different or the same. >> What would be the answer for each situation? >> >> On Fri, Dec 18, 2015 at 6:02 PM, Ted Yu <yuzhih...@gmail.com> wrote: >> >>> CacheManager#cacheQuery() is called where: >>> * Caches the data produced by the logical representation of the given >>> [[Queryable]]. >>> ... >>> val planToCache = query.queryExecution.analyzed >>> if (lookupCachedData(planToCache).nonEmpty) { >>> >>> Is the schema for dfNew different from that of dfOld ? >>> >>> Cheers >>> >>> On Fri, Dec 18, 2015 at 3:33 AM, Sahil Sareen <sareen...@gmail.com> >>> wrote: >>> >>>> Spark 1.5.2 >>>> >>>> dfOld.registerTempTable("oldTableName") >>>> sqlContext.cacheTable("oldTableName") >>>> // .... >>>> // do something >>>> // .... >>>> dfNew.registerTempTable("oldTableName") >>>> sqlContext.cacheTable("oldTableName") >>>> >>>> >>>> Now when I use the "oldTableName" table I do get the latest contents >>>> from dfNew but do the contents of dfOld get removed from the memory? >>>> >>>> Or is the right usage to do this: >>>> dfOld.registerTempTable("oldTableName") >>>> sqlContext.cacheTable("oldTableName") >>>> // .... >>>> // do something >>>> // .... >>>> dfNew.registerTempTable("oldTableName") >>>> sqlContext.unCacheTable("oldTableName") <========== unCache the old >>>> contents first >>>> sqlContext.cacheTable("oldTableName") >>>> >>>> -Sahil >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>>> For additional commands, e-mail: user-h...@spark.apache.org >>>> >>>> >>> >> >