You are a bit confused about master node, slave node and the driver
machine.

1. Master node can be kept as a smaller machine in your dev environment,
mostly in production you will be using Mesos or Yarn cluster manager.

2. Now, if you are running your driver program (the streaming job) on the
master machine, then it need access to the HDFS or wherever the write is
happening.

3. Master node is more like a control node, yes a smaller machine would do
but when you run the driver program on master machine, it would be good a
have enough memory and cores for your job to have low latency.




Thanks
Best Regards

On Mon, Jul 13, 2015 at 1:04 AM, algermissen1971 <algermissen1...@icloud.com
> wrote:

> Hi,
>
> I have a question that I really have problems with figuring out for myself:
>
> Does the master node in a spark cluster need to be a node similar to the
> slave nodes or should I rather view it as a coordinating node, that does
> not need much computing or storage power?
>
> For example, when using Spark Streaming and Checkpointing, would the
> master node need access to the shared file system (e.g. HDFS)? Or do I only
> need to mount that on the slaves?
> (likewise, if I use the Cassandra-Connector, does that (and C*) need to be
> installed on the master node, too?)
>
> Or, in other words: is the master just one node of similar cluster nodes,
> or is it merely a 'small control node', for which sort of any small VM
> would do?
>
> Jan
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Reply via email to