RE: Query related to spark cluster

2016-05-30 Thread Kumar, Saurabh 5. (Nokia - IN/Bangalore)
/Bangalore) ; user@spark.apache.org; Sawhney, Prerna (Nokia - IN/Bangalore) Subject: Re: Query related to spark cluster Hi Saurabh You can have hadoop cluster running YARN as scheduler. Configure spark to run with the same YARN setup. Then you need R only on 1 node , and connect to the cluster using

Re: Query related to spark cluster

2016-05-29 Thread Deepak Sharma
Hi Saurabh You can have hadoop cluster running YARN as scheduler. Configure spark to run with the same YARN setup. Then you need R only on 1 node , and connect to the cluster using the SparkR. Thanks Deepak On Mon, May 30, 2016 at 12:12 PM, Jörn Franke wrote: > > Well if you require R then you

RE: Query related to spark cluster

2016-05-29 Thread Kumar, Saurabh 5. (Nokia - IN/Bangalore)
) Cc: user@spark.apache.org; Sawhney, Prerna (Nokia - IN/Bangalore) Subject: Re: Query related to spark cluster Well if you require R then you need to install it (including all additional packages) on each node. I am not sure why you store the data in Postgres . Storing it in Parquet and Orc is

Re: Query related to spark cluster

2016-05-29 Thread Jörn Franke
Well if you require R then you need to install it (including all additional packages) on each node. I am not sure why you store the data in Postgres . Storing it in Parquet and Orc is sufficient in HDFS (sorted on relevant columns) and you use the SparkR libraries to access them. > On 30 May 2