Hi,
I am pretty confident I have observed Spark configured with the Shuffle Service
continuing to fetch shuffle files on a node in the event of executor failure,
rather than recompute the shuffle files as happens without the Shuffle Service.
Can anyone confirm this?
(I have a SO question
The cache() method on the DataFrame API caught me out.
Having learnt that DataFrames are built on RDDs and that RDDs are
immutable, when I saw the statement df.cache() in our codebase I thought
‘This must be a bug, the result is not assigned, the statement will have no
affect.’
However, I’ve sinc