Re: How to create fewer output files for Spark job ?

๏̯͡๏ Thu, 04 Jun 2015 04:48:07 -0700

It worked.

On Thu, Jun 4, 2015 at 5:14 PM, MEETHU MATHEW <meethu2...@yahoo.co.in>
wrote:


> Try using coalesce
>
> Thanks & Regards,
> Meethu M
>
>
>
>   On Wednesday, 3 June 2015 11:26 AM, "ÐΞ€ρ@Ҝ (๏̯͡๏)" <deepuj...@gmail.com>
> wrote:
>
>
> I am running a series of spark functions with 9000 executors and its
> resulting in 9000+ files that is execeeding the namespace file count qutota.
>
> How can Spark be configured to use CombinedOutputFormat.
> {code}
> protected def writeOutputRecords(detailRecords:
> RDD[(AvroKey[DetailOutputRecord], NullWritable)], outputDir: String) {
>     val writeJob = new Job()
>     val schema = SchemaUtil.outputSchema(_detail)
>     AvroJob.setOutputKeySchema(writeJob, schema)
>     detailRecords.saveAsNewAPIHadoopFile(outputDir,
>       classOf[AvroKey[GenericRecord]],
>       classOf[org.apache.hadoop.io.NullWritable],
>       classOf[AvroKeyOutputFormat[GenericRecord]],
>       writeJob.getConfiguration)
>   }
> {code}
>
> --
> Deepak
>
>
>
>


-- 
Deepak

Re: How to create fewer output files for Spark job ?

Reply via email to