Hi

I am little confused here. If I am writing to HDFS,shouldn't HDFS
replication factor will automatically kick in? In other words, how spark
writer is different than a hdfs -put commnd (from perspective of HDFS, of
course)?

Best
Ayan

On Tue, Jun 9, 2015 at 5:17 PM, Haopu Wang <hw...@qilinsoft.com> wrote:

> Cheng,
>
> yes, it works, I set the property in SparkConf before initiating
> SparkContext.
> The property name is "spark.hadoop.dfs.replication"
> Thanks fro the help!
>
> -----Original Message-----
> From: Cheng Lian [mailto:lian.cs....@gmail.com]
> Sent: Monday, June 08, 2015 6:41 PM
> To: Haopu Wang; user
> Subject: Re: SparkSQL: How to specify replication factor on the
> persisted parquet files?
>
> Then one possible workaround is to set "dfs.replication" in
> "sc.hadoopConfiguration".
>
> However, this configuration is shared by all Spark jobs issued within
> the same application. Since different Spark jobs can be issued from
> different threads, you need to pay attention to synchronization.
>
> Cheng
>
> On 6/8/15 2:46 PM, Haopu Wang wrote:
> > Cheng, thanks for the response.
> >
> > Yes, I was using HiveContext.setConf() to set "dfs.replication".
> > However, I cannot change the value in Hadoop core-site.xml because
> that
> > will change every HDFS file.
> > I only want to change the replication factor of some specific files.
> >
> > -----Original Message-----
> > From: Cheng Lian [mailto:lian.cs....@gmail.com]
> > Sent: Sunday, June 07, 2015 10:17 PM
> > To: Haopu Wang; user
> > Subject: Re: SparkSQL: How to specify replication factor on the
> > persisted parquet files?
> >
> > Were you using HiveContext.setConf()?
> >
> > "dfs.replication" is a Hadoop configuration, but setConf() is only
> used
> > to set Spark SQL specific configurations. You may either set it in
> your
> > Hadoop core-site.xml.
> >
> > Cheng
> >
> >
> > On 6/2/15 2:28 PM, Haopu Wang wrote:
> >> Hi,
> >>
> >> I'm trying to save SparkSQL DataFrame to a persistent Hive table
> using
> >> the default parquet data source.
> >>
> >> I don't know how to change the replication factor of the generated
> >> parquet files on HDFS.
> >>
> >> I tried to set "dfs.replication" on HiveContext but that didn't work.
> >> Any suggestions are appreciated very much!
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> >> For additional commands, e-mail: user-h...@spark.apache.org
> >>
> >>
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>


-- 
Best Regards,
Ayan Guha

Reply via email to