Re: DataFrame.withColumn() recomputes columns even after cache()

2015-07-14 Thread pnpritchard
I was able to workaround this by converting the DataFrame to an RDD and then back to DataFrame. This seems very weird to me, so any insight would be much appreciated! Thanks, Nick P.S. Here's the updated code with the workaround: ``` // Examples udf's that println when called val twice =

DataFrame.withColumn() recomputes columns even after cache()

2015-07-14 Thread pnpritchard
Hi! I am seeing some unexpected behavior with regards to cache() in DataFrames. Here goes: In my Scala application, I have created a DataFrame that I run multiple operations on. It is expensive to recompute the DataFrame, so I have called cache() after it gets created. I notice that the cache()