We are running this query: select name, sum_id from ( select name, players, array(player1, player2, player3, player4, player5, player6, player7, player8) arr from ( select name, get_json_object(roster_json, '$.memberList.playerId') players, get_json_object(roster_json, '$.memberList.playerId\[0]') player1, get_json_object(roster_json, '$.memberList.playerId\[1]') player2, get_json_object(roster_json, '$.memberList.playerId\[2]') player3, get_json_object(roster_json, '$.memberList.playerId\[3]') player4, get_json_object(roster_json, '$.memberList.playerId\[4]') player5, get_json_object(roster_json, '$.memberList.playerId\[5]') player6, get_json_object(roster_json, '$.memberList.playerId\[6]') player7, get_json_object(roster_json, '$.memberList.playerId\[7]') player8 from team *-- limit 1000000 -- some large number to make sure it runs* ) t2 where player8 is not null -- 8 members ) t1 lateral view explode(arr) member as sum_id ;
Notice the commented-out LIMIT in the innermost subquery. When we attempt to run the query, we get this error: Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_201112131753_22484, Tracking URL = http://laxhadoop1-001:50030/jobdetails.jsp?jobid=job_201112131753_22484 Kill Command = /usr/lib/hadoop/bin/hadoop job -Dmapred.job.tracker=laxhadoop1-001:54311 -kill job_201112131753_22484 2012-01-17 11:06:21,579 Stage-1 map = 0%, reduce = 0% 2012-01-17 11:06:45,745 Stage-1 map = 100%, reduce = 100% Ended Job = job_201112131753_22484 with errors java.lang.RuntimeException: Error while reading from task log url at org.apache.hadoop.hive.ql.exec.errors.TaskLogProcessor.getErrors(TaskLogProcessor.java:130) at org.apache.hadoop.hive.ql.exec.ExecDriver.showJobFailDebugInfo(ExecDriver.java:889) at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:680) at org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:123) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:130) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1063) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:900) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:748) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:209) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:286) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:513) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:186) Caused by: java.io.IOException: Server returned HTTP response code: 400 for URL: http://laxhadoop1-004:50060/tasklog?taskid=attempt_201112131753_22484_m_000000_0&all=true at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1436) at java.net.URL.openStream(URL.java:1010) at org.apache.hadoop.hive.ql.exec.errors.TaskLogProcessor.getErrors(TaskLogProcessor.java:120) ... 16 more Ended Job = job_201112131753_22484 with exception 'java.lang.RuntimeException(Error while reading from task log url)' FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MapRedTask If we uncomment the LIMIT clause, then the query *runs WITHOUT ERROR* and gives us the output we expect. However, the innermost subquery is about 150k rows of data, so the LIMIT has no effect other than working around this bug. I don't know the command to determine the version of Hive we're running, so hopefully seeing the installed RPM will determine that: $ rpm -qa | grep hive hadoop-hive-0.7.1+42.4-2 Does anyone have a clue about what's going on here? -- Tim Ellis Riot Games