Thank you, Tobias. I will look into  the Spark paper. But it looks that the 
paper has been moved, 
http://www.cs.berkeley.edu/~matei/papers/2012/nsdi_spark.pdf.
A web page is returned (Resource not found)when I access it.



bit1...@163.com
 
From: Tobias Pfeiffer
Date: 2015-01-07 09:24
To: Todd
CC: user
Subject: Re: I think I am almost lost in the internals of Spark
Hi,

On Tue, Jan 6, 2015 at 11:24 PM, Todd <bit1...@163.com> wrote:
I am a bit new to Spark, except that I tried simple things like word count, and 
the examples given in the spark sql programming guide.
Now, I am investigating the internals of Spark, but I think I am almost lost, 
because I could not grasp a whole picture what spark does when it executes the 
word count.

I recommend understanding what an RDD is and how it is processed, using
  
http://spark.apache.org/docs/latest/programming-guide.html#resilient-distributed-datasets-rdds
and probably also
  http://www.cs.berkeley.edu/~matei/papers/2012/nsdi_spark.pdf
  (once the server is back).
Understanding how an RDD is processed is probably most helpful to understand 
the whole of Spark.

Tobias

Reply via email to