How many nodes, cores and memory do you have? What hive version? Do you have the opportunity to use tez as an execution engine? Usually I use external tables only for reading them and inserting them into a table in Orc or parquet format for doing analytics. This is much more performant than json or any other text-based format.
> On 03 Dec 2015, at 14:20, Harsha HN <99harsha.h....@gmail.com> wrote: > > Hi, > > We have LZO compressed JSON files in our HDFS locations. I am creating an > "External" table on the data in HDFS for the purpose of analytics. > > There are 3 LZO compressed part files of size 229.16 MB, 705.79 MB, 157.61 MB > respectively along with their index files. > > When I run count(*) query on the table I observe only 10 mappers causing > performance bottleneck. > > I even tried following, (going for 30MB split) > 1) set mapreduce.input.fileinputformat.split.maxsize=31457280; > 2) set dfs.blocksize=31457280; > But still I am getting 10 mappers. > > Can you please guide me in fixing the same? > > Thanks, > Sree Harsha