subject:"Re\: increase parallelism of reading from hdfs"

Re: increase parallelism of reading from hdfs

2014-08-11 Thread Chen Song

Thanks Paul. I will give a try. On Mon, Aug 11, 2014 at 1:11 PM, Paul Hamilton wrote: > Hi Chen, > > You need to set the max input split size so that the underlying hadoop > libraries will calculate the splits appropriately. I have done the > following successfully: > > val job = new Job() > F

Re: increase parallelism of reading from hdfs

2014-08-11 Thread Paul Hamilton

Hi Chen, You need to set the max input split size so that the underlying hadoop libraries will calculate the splits appropriately. I have done the following successfully: val job = new Job() FileInputFormat.setMaxInputSplitSize(job, 12800L) And then use job.getConfiguration when creating a