So even on RDDs cache/persist mutate the RDD object. The important thing
for Spark is that the data represented/in the RDD/Dataframe isn’t mutated.
On Mon, May 25, 2020 at 10:56 AM Chris Thomas
wrote:
>
> The cache() method on the DataFrame API caught me out.
>
> Having learnt that DataFrames a
The cache() method on the DataFrame API caught me out.
Having learnt that DataFrames are built on RDDs and that RDDs are
immutable, when I saw the statement df.cache() in our codebase I thought
‘This must be a bug, the result is not assigned, the statement will have no
affect.’
However, I’ve sinc