So, I have a query like this: select user.id ud_name.value as name ud_age.value as age from user left outer join user_data ud_name on user.id = ud_name.user_id and ud_name.key = 'name' left outer join user_data ud_age on user.id = ud_age.user_id and ud_age.key = 'age' ... ;
With multiple joins over user_data, basically to flatten user_data into one row per user. I have 9 left outer joins and I'm getting reducer errors like this: Error: java.lang.RuntimeException: Hive Runtime Error while closing operators: null at org.apache.hadoop.hive.ql.exec.ExecReducer.close(ExecReducer.java:321) at org.apache.hadoop.io.IOUtils.cleanup(IOUtils.java:237) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:467) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:393) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162) Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:709) at org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:262) at org.apache.hadoop.hive.ql.exec.ExecReducer.close(ExecReducer.java:306) ... 8 more The weird thing is: if I remove one of the joins, regardless of which one, the query runs fine. So, 8 joins seem to be the magic number. Has anyone ever seen something like this?