Number of mappers is always 1 for external Parquet tables.

村下瑛 Wed, 24 Dec 2014 01:31:11 -0800

Hi, all

I am trying to load pig output from Hive as an external table,
and currently stuck with that Hive always set the number of mappers to 1,
though it has more than 10 million records and is composed of multiple
files.
Could any of guys have any idea?


To be more specific, the output is in Parquet format generated by Pig Script
without any compression.

STORE rows INTO '/table-data/test' USING parquet.pig.ParquetStorer;

The directory does contain 16 part-m-00xx.parquet files and _metadata.
And the external table is pointed to the directory.

Here are the create table statement I've used.

CREATE EXTERNAL TABLE `t_main_wop`(
  `id` string,
  `f1` string,
  ...
 )
STORED AS PARQUET
LOCATION
  '/table-data/test';

It seem to properly read the parquet file itself since
SELECT * FROM test;
returns the proper result.

However, everytime I give it queries that requires mapreduce jobs,
It only uses single mapper, and takes like forever.

hive> select count(*) from t_main_wop;
Query ID = xxx
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_yyy, Tracking URL = zzz
Kill Command = hadoop_job  -kill job_yyy
Hadoop job information for Stage-1: number of mappers: 1; number of
reducers: 1
2014-12-24 02:49:46,912 Stage-1 map = 0%,  reduce = 0%
2014-12-24 02:50:45,847 Stage-1 map = 0%,  reduce = 0%


Why is it?
I've set mapred.map.tasks=100, but to no avail.
Again the directory contans 16 part files, so I think it sould be able to
use at least 16 mappers.

I would really appreciate if you could give me any suggestions
Thanks,

Akira

Number of mappers is always 1 for external Parquet tables.

Reply via email to