Re: How to parallel read files in a directory

Jörn Franke Fri, 12 Feb 2016 03:26:19 -0800

Put many small files in Hadoop Archives (HAR) to improve performance of reading 
small files. Alternatively have a batch job concatenating them.


> On 11 Feb 2016, at 18:33, Junjie Qian <qian.jun...@outlook.com> wrote:
> 
> Hi all,
> 
> I am working with Spark 1.6, scala and have a big dataset divided into 
> several small files.
> 
> My question is: right now the read operation takes really long time and often 
> has RDD warnings. Is there a way I can read the files in parallel, that all 
> nodes or workers read the file at the same time?
> 
> Many thanks
> Junjie

Re: How to parallel read files in a directory

Reply via email to