e5c4cd8a5e188592f8786a265 was from 2011. Not sure why you started with such an early commit.
Spark project has evolved quite fast. I suggest you clone Spark project from github.com/apache/spark/ and start with core/src/main/scala/org/apache/spark/rdd/RDD.scala Cheers On Sun, Jul 19, 2015 at 7:44 PM, Yang <teddyyyy...@gmail.com> wrote: > I'm trying to understand how spark works under the hood, so I tried to > read the source code. > > as I normally do, I downloaded the git source code, reverted to the very > first version ( actually e5c4cd8a5e188592f8786a265c0cd073c69ac886 since the > first version even lacked the definition of RDD.scala) > > but the code looks "too simple" and I can't find where the "magic" > happens, i.e. a transformation /computation is scheduled on a machine, > bytes stored etc. > > it would be great if someone could show me a path in which the different > source files are involved, so that I could read each of them in turn. > > thanks! > yang >