Your approach requires still that all data is sent via the master every time 
you want to process it. 
You probably want to use Hadoop HDFS as a distributed file system and exploit 
its data locality features. This seems to be very suitable for your scenario. 
However, I recommend to use it with a Hadoop distribution including Spark. 
Windows is only supported by Hortonworks. Alternatively, you can create your 
own distribution. It is feasible, but more effort.

> On 27 Nov 2015, at 23:29, Shuo Wang <shuo.x.w...@gmail.com> wrote:
> 
> Hi,
> 
> I am trying to build a small home spark cluster on windows.  I have a 
> question regarding how to share the data files for the master node and worker 
> nodes to process. The data files are pretty large, a few 100G.
> 
> Can I just use windows shared folder as the file path for my driver/master, 
> and worker nodes, where my worker nodes exist on the same LAN as my 
> driver/master, and the shared folder is on my master node? 
> 
> -- 
> 王硕
> 邮箱:shuo.x.w...@gmail.com
> Whatever your journey, keep walking.

Reply via email to