Re: Reading from HDFS by increasing split size

Jörn Franke Tue, 10 Oct 2017 13:16:50 -0700

Maybe you need to set the parameters for the mapreduce api and not the mapred 
api. I do not have in mind now how they differ but the Hadoop web page should 
tell you ;-)


> On 10. Oct 2017, at 17:53, Kanagha Kumar <kpra...@salesforce.com> wrote:
> 
> Thanks for the inputs!!
> 
> I passed in spark.mapred.max.split.size, spark.mapred.min.split.size to the 
> size I wanted to read. It didn't take any effect.
> I also tried passing in spark.dfs.block.size, with all the params set to the 
> same value.
> 
> JavaSparkContext.fromSparkContext(spark.sparkContext()).textFile(hdfsPath, 
> 13);
> 
> Is there any other param that needs to be set as well?
> 
> Thanks
> 
>> On Tue, Oct 10, 2017 at 4:32 AM, ayan guha <guha.a...@gmail.com> wrote:
>> I have not tested this, but you should be able to pass on any map-reduce 
>> like conf to underlying hadoop config.....essentially you should be able to 
>> control behaviour of split as you can do in a map-reduce program (as Spark 
>> uses the same input format)
>> 
>>> On Tue, Oct 10, 2017 at 10:21 PM, Jörn Franke <jornfra...@gmail.com> wrote:
>>> Write your own input format/datasource or split the file yourself 
>>> beforehand (not recommended).
>>> 
>>> > On 10. Oct 2017, at 09:14, Kanagha Kumar <kpra...@salesforce.com> wrote:
>>> >
>>> > Hi,
>>> >
>>> > I'm trying to read a 60GB HDFS file using spark 
>>> > textFile("hdfs_file_path", minPartitions).
>>> >
>>> > How can I control the no.of tasks by increasing the split size? With 
>>> > default split size of 250 MB, several tasks are created. But I would like 
>>> > to have a specific no.of tasks created while reading from HDFS itself 
>>> > instead of using repartition() etc.,
>>> >
>>> > Any suggestions are helpful!
>>> >
>>> > Thanks
>>> >
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>> 
>> 
>> 
>> 
>> -- 
>> Best Regards,
>> Ayan Guha
> 
> 
> 
> -- 
> 
>

Re: Reading from HDFS by increasing split size

Reply via email to