Hi I am little confused here. If I am writing to HDFS,shouldn't HDFS replication factor will automatically kick in? In other words, how spark writer is different than a hdfs -put commnd (from perspective of HDFS, of course)?
Best Ayan On Tue, Jun 9, 2015 at 5:17 PM, Haopu Wang <hw...@qilinsoft.com> wrote: > Cheng, > > yes, it works, I set the property in SparkConf before initiating > SparkContext. > The property name is "spark.hadoop.dfs.replication" > Thanks fro the help! > > -----Original Message----- > From: Cheng Lian [mailto:lian.cs....@gmail.com] > Sent: Monday, June 08, 2015 6:41 PM > To: Haopu Wang; user > Subject: Re: SparkSQL: How to specify replication factor on the > persisted parquet files? > > Then one possible workaround is to set "dfs.replication" in > "sc.hadoopConfiguration". > > However, this configuration is shared by all Spark jobs issued within > the same application. Since different Spark jobs can be issued from > different threads, you need to pay attention to synchronization. > > Cheng > > On 6/8/15 2:46 PM, Haopu Wang wrote: > > Cheng, thanks for the response. > > > > Yes, I was using HiveContext.setConf() to set "dfs.replication". > > However, I cannot change the value in Hadoop core-site.xml because > that > > will change every HDFS file. > > I only want to change the replication factor of some specific files. > > > > -----Original Message----- > > From: Cheng Lian [mailto:lian.cs....@gmail.com] > > Sent: Sunday, June 07, 2015 10:17 PM > > To: Haopu Wang; user > > Subject: Re: SparkSQL: How to specify replication factor on the > > persisted parquet files? > > > > Were you using HiveContext.setConf()? > > > > "dfs.replication" is a Hadoop configuration, but setConf() is only > used > > to set Spark SQL specific configurations. You may either set it in > your > > Hadoop core-site.xml. > > > > Cheng > > > > > > On 6/2/15 2:28 PM, Haopu Wang wrote: > >> Hi, > >> > >> I'm trying to save SparkSQL DataFrame to a persistent Hive table > using > >> the default parquet data source. > >> > >> I don't know how to change the replication factor of the generated > >> parquet files on HDFS. > >> > >> I tried to set "dfs.replication" on HiveContext but that didn't work. > >> Any suggestions are appreciated very much! > >> > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > >> For additional commands, e-mail: user-h...@spark.apache.org > >> > >> > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > > -- Best Regards, Ayan Guha