/Bangalore) ;
user@spark.apache.org; Sawhney, Prerna (Nokia - IN/Bangalore)
Subject: Re: Query related to spark cluster
Hi Saurabh
You can have hadoop cluster running YARN as scheduler.
Configure spark to run with the same YARN setup.
Then you need R only on 1 node , and connect to the cluster using
Hi Saurabh
You can have hadoop cluster running YARN as scheduler.
Configure spark to run with the same YARN setup.
Then you need R only on 1 node , and connect to the cluster using the
SparkR.
Thanks
Deepak
On Mon, May 30, 2016 at 12:12 PM, Jörn Franke wrote:
>
> Well if you require R then you
)
Cc: user@spark.apache.org; Sawhney, Prerna (Nokia - IN/Bangalore)
Subject: Re: Query related to spark cluster
Well if you require R then you need to install it (including all additional
packages) on each node. I am not sure why you store the data in Postgres .
Storing it in Parquet and Orc is
Well if you require R then you need to install it (including all additional
packages) on each node. I am not sure why you store the data in Postgres .
Storing it in Parquet and Orc is sufficient in HDFS (sorted on relevant
columns) and you use the SparkR libraries to access them.
> On 30 May 2