Re: Reading from HDFS by increasing split size

Jörn Franke Tue, 10 Oct 2017 04:29:12 -0700

Write your own input format/datasource or split the file yourself beforehand 
(not recommended).


> On 10. Oct 2017, at 09:14, Kanagha Kumar <kpra...@salesforce.com> wrote:
> 
> Hi,
> 
> I'm trying to read a 60GB HDFS file using spark textFile("hdfs_file_path", 
> minPartitions). 
> 
> How can I control the no.of tasks by increasing the split size? With default 
> split size of 250 MB, several tasks are created. But I would like to have a 
> specific no.of tasks created while reading from HDFS itself instead of using 
> repartition() etc.,
> 
> Any suggestions are helpful!
> 
> Thanks
> 

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: Reading from HDFS by increasing split size

Reply via email to