I'm trying to understand how spark works under the hood, so I tried to read the source code.
as I normally do, I downloaded the git source code, reverted to the very first version ( actually e5c4cd8a5e188592f8786a265c0cd073c69ac886 since the first version even lacked the definition of RDD.scala) but the code looks "too simple" and I can't find where the "magic" happens, i.e. a transformation /computation is scheduled on a machine, bytes stored etc. it would be great if someone could show me a path in which the different source files are involved, so that I could read each of them in turn. thanks! yang