Hi, I have a use case for which I want to override the default hdfs replication factor from my spark code. For this I have set the hadoop replication like this:
val sc = new SparkContext(conf) sc.hadoopConfiguration.set('dfs.replication','1'). Now my spark job runs as a cron job in some specific interval and create output directory for corresponding hour. Problem I am facing is that for 80% of the runs,the files are created with replication factor 1(which is desired), but for rest 20% case files are created with default replication factor 2. I am not sure why that is happening. Any help would be appreciated. Thank you Divya