Hi,
I am trying to run a simple join query on hive 13.
Both tables are in text format. Both tables are read in mappers, and the
error is thrown in reducer. I don't get why a reducer is reading a table
when the mappers have read it already and the reason for assuming that the
video file is in SequenceFile format.
Below, you can find query, query plan, and the error. Any help will be
greatly appreciated.
Thanks,
Sid
*Hadoop Version:* 2.0.0-mr1
Query:
SELECT computerguid
FROM revenue_start_adeffx_v2
JOIN video
ON revenue_start_adeffx_v2.video_id = video.video_id
WHERE hourid = '389567';
Query Plan:
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-0 is a root stage
STAGE PLANS:
Stage: Stage-1
Map Reduce
Map Operator Tree:
TableScan
alias: revenue_start_adeffx_v2
Statistics: Num rows: 3175840 Data size: 330287403 Basic stats:
COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: video_id (type: int)
sort order: +
Map-reduce partition columns: video_id (type: int)
Statistics: Num rows: 3175840 Data size: 330287403 Basic
stats: COMPLETE Column stats: NONE
value expressions: computerguid (type: string)
TableScan
alias: video
Statistics: Num rows: 146679792 Data size: 586719168 Basic
stats: COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: video_id (type: int)
sort order: +
Map-reduce partition columns: video_id (type: int)
Statistics: Num rows: 146679792 Data size: 586719168 Basic
stats: COMPLETE Column stats: NONE
Reduce Operator Tree:
Join Operator
condition map:
Inner Join 0 to 1
condition expressions:
0 {VALUE._col0}
1
outputColumnNames: _col0
Statistics: Num rows: 161347776 Data size: 645391104 Basic stats:
COMPLETE Column stats: NONE
Select Operator
expressions: _col0 (type: string)
outputColumnNames: _col0
Statistics: Num rows: 161347776 Data size: 645391104 Basic
stats: COMPLETE Column stats: NONE
File Output Operator
compressed: false
Statistics: Num rows: 161347776 Data size: 645391104 Basic
stats: COMPLETE Column stats: NONE
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format:
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Stage: Stage-0
Fetch Operator
limit: -1
Error:
2014-06-11 10:18:34,818 FATAL ExecReducer:
org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException:
hdfs://<NN><Path>/video/video_20140611051139 not a SequenceFile
at
org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:237)
at
org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:74)
at
org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genUniqueJoinObject(CommonJoinOperator.java:644)
at
org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:758)
at
org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:256)
at
org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:216)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:506)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:447)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: java.io.IOException:
hdfs:/<NN><Path>/hive/warehouse/video/video_20140611051139 not a
SequenceFile
at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1805)
at
org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1765)
at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1714)
at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1728)
at
org.apache.hadoop.mapred.SequenceFileRecordReader.<init>(SequenceFileRecordReader.java:43)
at
org.apache.hadoop.mapred.SequenceFileInputFormat.getRecordReader(SequenceFileInputFormat.java:59)
at
org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:226)
... 12 more
2014-06-11 10:18:34,822 INFO org.apache.hadoop.mapred.TaskLogsTruncater:
Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
2014-06-11 10:18:34,824 WARN org.apache.hadoop.mapred.Child: Error running
child
java.lang.RuntimeException:
org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException:
hdfs://<NN><Path>/video_20140611051139 not a SequenceFile
at
org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:283)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:506)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:447)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException:
java.io.IOException: hdfs://<NN><Path>/video/video_20140611051139 not a
SequenceFile
at
org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:237)
at
org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:74)
at
org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genUniqueJoinObject(CommonJoinOperator.java:644)
at
org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:758)
at
org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:256)
at
org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:216)
... 7 more
Caused by: java.io.IOException:
hdfs://<NN><Path>/video/video_20140611051139 not a SequenceFile
at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1805)
at
org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1765)
at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1714)
at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1728)
at
org.apache.hadoop.mapred.SequenceFileRecordReader.<init>(SequenceFileRecordReader.java:43)
at
org.apache.hadoop.mapred.SequenceFileInputFormat.getRecordReader(SequenceFileInputFormat.java:59)
at org.apache.