Re: Access to live data of cached dataFrame

2019-05-21 Thread Wenchen Fan
When you cache a dataframe, you actually cache a logical plan. That's why re-creating the dataframe doesn't work: Spark finds out the logical plan is cached and picks the cached data. You need to uncache the dataframe, or go back to the SQL way: df.createTempView("abc") spark.table("abc").cache()

Re: Access to live data of cached dataFrame

2019-05-19 Thread Tomas Bartalos
I'm trying to re-read however I'm getting cached data (which is a bit confusing). For re-read I'm issuing: spark.read.format("delta").load("/data").groupBy(col("event_hour")).count The cache seems to be global influencing also new dataframes. So the question is how should I re-read without loosin

Re: Access to live data of cached dataFrame

2019-05-17 Thread Sean Owen
A cached DataFrame isn't supposed to change, by definition. You can re-read each time or consider setting up a streaming source on the table which provides a result that updates as new data comes in. On Fri, May 17, 2019 at 1:44 PM Tomas Bartalos wrote: > > Hello, > > I have a cached dataframe: >