So, I have a query like this:

select
user.id
ud_name.value as name
ud_age.value as age
from user
left outer join user_data ud_name on user.id = ud_name.user_id and
ud_name.key = 'name'
left outer join user_data ud_age on user.id = ud_age.user_id and ud_age.key
= 'age'
...
;


With multiple joins over user_data, basically to flatten user_data into one
row per user. I have 9 left outer joins and I'm getting reducer errors like
this:

Error: java.lang.RuntimeException: Hive Runtime Error while closing
operators: null at
org.apache.hadoop.hive.ql.exec.ExecReducer.close(ExecReducer.java:321) at
org.apache.hadoop.io.IOUtils.cleanup(IOUtils.java:237) at
org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:467) at
org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:393) at
org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167) at
java.security.AccessController.doPrivileged(Native Method) at
javax.security.auth.Subject.doAs(Subject.java:415) at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162) Caused by:
java.lang.NullPointerException at
org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:709)
at
org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:262)
at org.apache.hadoop.hive.ql.exec.ExecReducer.close(ExecReducer.java:306)
... 8 more


The weird thing is: if I remove one of the joins, regardless of which one,
the query runs fine. So, 8 joins seem to be the magic number. Has anyone
ever seen something like this?

Reply via email to