So I looked at the function, my only worry is that the cache should be
cleared if I'm overwriting the cache with the same table name. I did this
experiment and the cache shows as table not cached but want to confirm
this. In addition to not using the old table values is it actually
removed/overwritten in memory?
scala> df.collect
res54: Array[org.apache.spark.sql.Row] = Array([blue,#0033FF],
[red,#FF0000], [green,#FSKA]) <=== 3 rows
scala> df2.collect
res55: Array[org.apache.spark.sql.Row] = Array([blue,#0033FF],
[red,#FF0000]) <=== 2 rows
scala> df.registerTempTable("myColorsTable")
scala> sqlContext.isCached("myColorsTable")
res58: Boolean = false
scala> sqlContext.cacheTable("myColorsTable") <=== cache table in df(3 rows)
scala> sqlContext.isCached("myColorsTable")
res60: Boolean = true
scala> sqlContext.sql("select * from myColorsTable").foreach(println)
<=== sql is running on df(3 rows)
[blue,#0033FF]
[red,#FF0000]
[green,#FSKA]
scala> df2.registerTempTable("myColorsTable") <=== register another
table with the same table name
*scala> sqlContext.isCached("myColorsTable")
res63: Boolean = false*
scala> sqlContext.sql("select * from myColorsTable").foreach(println)
<=== sql is running on df2(2 rows)
[blue,#0033FF]
[red,#FF0000]
On Fri, Dec 18, 2015 at 11:17 PM, Ted Yu <[email protected]> wrote:
> This method in CacheManager:
> private[sql] def lookupCachedData(plan: LogicalPlan): Option[CachedData]
> = readLock {
> cachedData.find(cd => plan.sameResult(cd.plan))
>
> Ied me to the following in
> sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala
> :
>
> def sameResult(plan: LogicalPlan): Boolean = {
>
> There is detailed comment above this method which should give some idea.
>
> Cheers
>
> On Fri, Dec 18, 2015 at 9:21 AM, Sahil Sareen <[email protected]> wrote:
>
>> Thanks Ted!
>>
>> Yes, The schema might be different or the same.
>> What would be the answer for each situation?
>>
>> On Fri, Dec 18, 2015 at 6:02 PM, Ted Yu <[email protected]> wrote:
>>
>>> CacheManager#cacheQuery() is called where:
>>> * Caches the data produced by the logical representation of the given
>>> [[Queryable]].
>>> ...
>>> val planToCache = query.queryExecution.analyzed
>>> if (lookupCachedData(planToCache).nonEmpty) {
>>>
>>> Is the schema for dfNew different from that of dfOld ?
>>>
>>> Cheers
>>>
>>> On Fri, Dec 18, 2015 at 3:33 AM, Sahil Sareen <[email protected]>
>>> wrote:
>>>
>>>> Spark 1.5.2
>>>>
>>>> dfOld.registerTempTable("oldTableName")
>>>> sqlContext.cacheTable("oldTableName")
>>>> // ....
>>>> // do something
>>>> // ....
>>>> dfNew.registerTempTable("oldTableName")
>>>> sqlContext.cacheTable("oldTableName")
>>>>
>>>>
>>>> Now when I use the "oldTableName" table I do get the latest contents
>>>> from dfNew but do the contents of dfOld get removed from the memory?
>>>>
>>>> Or is the right usage to do this:
>>>> dfOld.registerTempTable("oldTableName")
>>>> sqlContext.cacheTable("oldTableName")
>>>> // ....
>>>> // do something
>>>> // ....
>>>> dfNew.registerTempTable("oldTableName")
>>>> sqlContext.unCacheTable("oldTableName") <========== unCache the old
>>>> contents first
>>>> sqlContext.cacheTable("oldTableName")
>>>>
>>>> -Sahil
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: [email protected]
>>>> For additional commands, e-mail: [email protected]
>>>>
>>>>
>>>
>>
>