On Thu, Mar 12, 2015 at 1:45 AM, wrote:
>
> In your response you say “When you call reduce and *similar *methods,
> each partition can be reduced in parallel. Then the results of that can be
> transferred across the network and reduced to the final result”. By similar
> methods do you mean all ac
Thank you very much for your detailed response, it was very informative and
cleared up some of my misconceptions. After your explanation, I understand that
the distribution of the data and parallelism is all meant to be an abstraction
to the developer.
In your response you say “When you ca
An RDD is a Resilient *Distributed* Data set. The partitioning and
distribution of the data happens in the background. You'll occasionally
need to concern yourself with it (especially to get good performance), but
from an API perspective it's mostly invisible (some methods do allow you to
specify a