I was able to workaround this by converting the DataFrame to an RDD and then
back to DataFrame. This seems very weird to me, so any insight would be much
appreciated!
Thanks,
Nick
P.S. Here's the updated code with the workaround:
```
// Examples udf's that println when called
val twice =
Hi!
I am seeing some unexpected behavior with regards to cache() in DataFrames.
Here goes:
In my Scala application, I have created a DataFrame that I run multiple
operations on. It is expensive to recompute the DataFrame, so I have called
cache() after it gets created.
I notice that the cache()