Re: How to create fewer output files for Spark job ?

MEETHU MATHEW Thu, 04 Jun 2015 04:45:15 -0700

Try using coalesce Thanks & Regards,
Meethu M


     On Wednesday, 3 June 2015 11:26 AM, "ÐΞ€ρ@Ҝ (๏̯͡๏)" <deepuj...@gmail.com> 
wrote:
   

 I am running a series of spark functions with 9000 executors and its resulting 
in 9000+ files that is execeeding the namespace file count qutota.
How can Spark be configured to use CombinedOutputFormat. {code}protected def 
writeOutputRecords(detailRecords: RDD[(AvroKey[DetailOutputRecord], 
NullWritable)], outputDir: String) {    val writeJob = new Job()    val schema 
= SchemaUtil.outputSchema(_detail)    AvroJob.setOutputKeySchema(writeJob, 
schema)    detailRecords.saveAsNewAPIHadoopFile(outputDir,      
classOf[AvroKey[GenericRecord]],      
classOf[org.apache.hadoop.io.NullWritable],      
classOf[AvroKeyOutputFormat[GenericRecord]],      writeJob.getConfiguration)  
}{code}

-- 
Deepak

Re: How to create fewer output files for Spark job ?

Reply via email to