>From the UI I see two rows for this on a streaming application: RDD NameStorage LevelCached PartitionsFraction CachedSize in MemorySize in ExternalBlockStoreSize on DiskIn-memory table myColorsTableMemory Deserialized 1x Replicated2100%728.2 KB0.0 B0.0 BIn-memory table myColorsTableMemory Deserialized 1x Replicated2100%728.2 KB0.0 B0.0 BThis means it wasn't overwritten :( My question now is, if only the latest table is going to be used, why isn't the earlier version auto cleared?
On Fri, Dec 18, 2015 at 11:44 PM, Sahil Sareen <[email protected]> wrote: > So I looked at the function, my only worry is that the cache should be > cleared if I'm overwriting the cache with the same table name. I did this > experiment and the cache shows as table not cached but want to confirm > this. In addition to not using the old table values is it actually > removed/overwritten in memory? > > scala> df.collect > res54: Array[org.apache.spark.sql.Row] = Array([blue,#0033FF], [red,#FF0000], > [green,#FSKA]) <=== 3 rows > > scala> df2.collect > res55: Array[org.apache.spark.sql.Row] = Array([blue,#0033FF], [red,#FF0000]) > <=== 2 rows > > scala> df.registerTempTable("myColorsTable") > > scala> sqlContext.isCached("myColorsTable") > res58: Boolean = false > > scala> sqlContext.cacheTable("myColorsTable") <=== cache table in df(3 rows) > > scala> sqlContext.isCached("myColorsTable") > res60: Boolean = true > > scala> sqlContext.sql("select * from myColorsTable").foreach(println) <=== > sql is running on df(3 rows) > [blue,#0033FF] > [red,#FF0000] > [green,#FSKA] > > scala> df2.registerTempTable("myColorsTable") <=== register another table > with the same table name > *scala> sqlContext.isCached("myColorsTable") > res63: Boolean = false* > > scala> sqlContext.sql("select * from myColorsTable").foreach(println) <=== > sql is running on df2(2 rows) > [blue,#0033FF] > [red,#FF0000] > > > On Fri, Dec 18, 2015 at 11:17 PM, Ted Yu <[email protected]> wrote: > >> This method in CacheManager: >> private[sql] def lookupCachedData(plan: LogicalPlan): >> Option[CachedData] = readLock { >> cachedData.find(cd => plan.sameResult(cd.plan)) >> >> Ied me to the following in >> sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala >> : >> >> def sameResult(plan: LogicalPlan): Boolean = { >> >> There is detailed comment above this method which should give some idea. >> >> Cheers >> >> On Fri, Dec 18, 2015 at 9:21 AM, Sahil Sareen <[email protected]> >> wrote: >> >>> Thanks Ted! >>> >>> Yes, The schema might be different or the same. >>> What would be the answer for each situation? >>> >>> On Fri, Dec 18, 2015 at 6:02 PM, Ted Yu <[email protected]> wrote: >>> >>>> CacheManager#cacheQuery() is called where: >>>> * Caches the data produced by the logical representation of the given >>>> [[Queryable]]. >>>> ... >>>> val planToCache = query.queryExecution.analyzed >>>> if (lookupCachedData(planToCache).nonEmpty) { >>>> >>>> Is the schema for dfNew different from that of dfOld ? >>>> >>>> Cheers >>>> >>>> On Fri, Dec 18, 2015 at 3:33 AM, Sahil Sareen <[email protected]> >>>> wrote: >>>> >>>>> Spark 1.5.2 >>>>> >>>>> dfOld.registerTempTable("oldTableName") >>>>> sqlContext.cacheTable("oldTableName") >>>>> // .... >>>>> // do something >>>>> // .... >>>>> dfNew.registerTempTable("oldTableName") >>>>> sqlContext.cacheTable("oldTableName") >>>>> >>>>> >>>>> Now when I use the "oldTableName" table I do get the latest contents >>>>> from dfNew but do the contents of dfOld get removed from the memory? >>>>> >>>>> Or is the right usage to do this: >>>>> dfOld.registerTempTable("oldTableName") >>>>> sqlContext.cacheTable("oldTableName") >>>>> // .... >>>>> // do something >>>>> // .... >>>>> dfNew.registerTempTable("oldTableName") >>>>> sqlContext.unCacheTable("oldTableName") <========== unCache the old >>>>> contents first >>>>> sqlContext.cacheTable("oldTableName") >>>>> >>>>> -Sahil >>>>> >>>>> --------------------------------------------------------------------- >>>>> To unsubscribe, e-mail: [email protected] >>>>> For additional commands, e-mail: [email protected] >>>>> >>>>> >>>> >>> >> >
