Re: Handling LZO files

Jörn Franke Thu, 03 Dec 2015 05:33:02 -0800

How many nodes, cores and memory do you have?
What hive version?

Do you have the opportunity to use tez as an execution engine?
Usually  I use external tables only for reading them and inserting them into a 
table in Orc or parquet format for doing analytics.
This is much more performant than json or any other text-based format.


> On 03 Dec 2015, at 14:20, Harsha HN <99harsha.h....@gmail.com> wrote:
> 
> Hi,
> 
> We have LZO compressed JSON files in our HDFS locations. I am creating an 
> "External" table on the data in HDFS for the purpose of analytics. 
> 
> There are 3 LZO compressed part files of size 229.16 MB, 705.79 MB, 157.61 MB 
> respectively along with their index files. 
> 
> When I run count(*) query on the table I observe only 10 mappers causing 
> performance bottleneck. 
> 
> I even tried following, (going for 30MB split)
>  1)  set mapreduce.input.fileinputformat.split.maxsize=31457280;
> 2) set dfs.blocksize=31457280;
> But still I am getting 10 mappers.
> 
> Can you please guide me in fixing the same?
> 
> Thanks,
> Sree Harsha

Re: Handling LZO files

Reply via email to