hadoop replication property from spark code not working

Divya Narayan Wed, 26 Jun 2019 05:23:04 -0700

Hi,

I have a use case for which I want to override the default hdfs replication
factor from my spark code. For this I have set the hadoop replication like
this:


val sc = new SparkContext(conf)
sc.hadoopConfiguration.set('dfs.replication','1').

Now my spark job runs as a cron job in some specific interval and create
output directory for corresponding hour. Problem I am facing is that for
80% of the runs,the  files are created with replication factor 1(which is
desired), but for rest 20% case files are created with default replication
factor 2. I am not sure why that is happening. Any help would be
appreciated.

Thank you
Divya

hadoop replication property from spark code not working

Reply via email to