Odd Behaviour with get_json() (or perhaps with explode(array))

Time Less Thu, 19 Jan 2012 13:27:49 -0800

We are running this query:

select name, sum_id
from (
select name, players,
array(player1, player2, player3, player4, player5, player6, player7,
player8) arr
from (
select name,
get_json_object(roster_json, '$.memberList.playerId') players,
get_json_object(roster_json, '$.memberList.playerId\[0]') player1,
get_json_object(roster_json, '$.memberList.playerId\[1]') player2,
get_json_object(roster_json, '$.memberList.playerId\[2]') player3,
get_json_object(roster_json, '$.memberList.playerId\[3]') player4,
get_json_object(roster_json, '$.memberList.playerId\[4]') player5,
get_json_object(roster_json, '$.memberList.playerId\[5]') player6,
get_json_object(roster_json, '$.memberList.playerId\[6]') player7,
get_json_object(roster_json, '$.memberList.playerId\[7]') player8
from team
*-- limit 1000000 -- some large number to make sure it runs*
) t2
where player8 is not null -- 8 members
) t1
lateral view explode(arr) member as sum_id
;


Notice the commented-out LIMIT in the innermost subquery. When we attempt
to run the query, we get this error:

Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201112131753_22484, Tracking URL =
http://laxhadoop1-001:50030/jobdetails.jsp?jobid=job_201112131753_22484
Kill Command = /usr/lib/hadoop/bin/hadoop job
-Dmapred.job.tracker=laxhadoop1-001:54311 -kill job_201112131753_22484
2012-01-17 11:06:21,579 Stage-1 map = 0%,  reduce = 0%
2012-01-17 11:06:45,745 Stage-1 map = 100%,  reduce = 100%
Ended Job = job_201112131753_22484 with errors
java.lang.RuntimeException: Error while reading from task log url
        at
org.apache.hadoop.hive.ql.exec.errors.TaskLogProcessor.getErrors(TaskLogProcessor.java:130)
        at
org.apache.hadoop.hive.ql.exec.ExecDriver.showJobFailDebugInfo(ExecDriver.java:889)
        at
org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:680)
        at
org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:123)
        at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:130)
        at
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
        at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1063)
        at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:900)
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:748)
        at
org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:209)
        at
org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:286)
        at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:513)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
Caused by: java.io.IOException: Server returned HTTP response code: 400 for
URL:
http://laxhadoop1-004:50060/tasklog?taskid=attempt_201112131753_22484_m_000000_0&all=true
        at
sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1436)
        at java.net.URL.openStream(URL.java:1010)
        at
org.apache.hadoop.hive.ql.exec.errors.TaskLogProcessor.getErrors(TaskLogProcessor.java:120)
        ... 16 more
Ended Job = job_201112131753_22484 with exception
'java.lang.RuntimeException(Error while reading from task log url)'
FAILED: Execution Error, return code 1 from
org.apache.hadoop.hive.ql.exec.MapRedTask

If we uncomment the LIMIT clause, then the query *runs WITHOUT ERROR* and
gives us the output we expect. However, the innermost subquery is about
150k rows of data, so the LIMIT has no effect other than working around
this bug. I don't know the command to determine the version of Hive we're
running, so hopefully seeing the installed RPM will determine that:

$ rpm -qa | grep hive
hadoop-hive-0.7.1+42.4-2

Does anyone have a clue about what's going on here?

-- 
Tim Ellis
Riot Games

Odd Behaviour with get_json() (or perhaps with explode(array))

Reply via email to