Newbie question: can shuffle avoid writing and reading from disk?

Muler Wed, 05 Aug 2015 16:12:10 -0700

Hi,

Consider I'm running WordCount with 100m of data on 4 node cluster.
Assuming my RAM size on each node is 200g and i'm giving my executors 100g
(just enough memory for 100m data)



   1. If I have enough memory, can Spark 100% avoid writing to disk?
   2. During shuffle, where results have to be collected from nodes, does
   each node write to disk and then the results are pulled from disk? If not,
   what is the API that is being used to pull data from nodes across the
   cluster? (I'm thinking what Scala or Java packages would allow you to read
   in-memory data from other machines?)

Thanks,

Newbie question: can shuffle avoid writing and reading from disk?

Reply via email to