Any response for this?
1. How do I know what statements will be executed on worker side out of the
spark script in a stage.
e.g. if I have
val x = 1 (or any other code)
in my driver code, will the same statements be executed on the worker side
in a stage?
2. How can I do a map side join in spark :
a. without broadcast(i.e. by reading a file once in each executor)
b. with broadcast but by broadcasting complete RDD to each executor
Regards
- Saurabh Wadhawan
On 19-Oct-2014, at 1:54 am, Saurabh Wadhawan
<[email protected]<mailto:[email protected]>> wrote:
Hi,
I have following questions:
1. When I write a spark script, how do I know what part runs on the driver side
and what runs on the worker side.
So lets say, I write code to to read a plain text file.
Will it run on driver side only or will it run on server side only or on
both sides
2. If I want each worker to load a file for lets say join and the file is
pretty huge lets say in GBs, so that I don't want to broadcast it, then what's
the best way to do it.
Another way to say the same thing would be how do I load a data structure
for fast lookup(and not an RDD) on each worker node in the executor
Regards
- Saurabh