Jonathan Natkins created HIVE-3333: -------------------------------------- Summary: Specified SerDe does not get used when executing a query over JSON data Key: HIVE-3333 URL: https://issues.apache.org/jira/browse/HIVE-3333 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Reporter: Jonathan Natkins Attachments: hive-test-case.tar.gz
I found a JSON SerDe that I wanted to try out, and I ran into some issues attempting to use it. The script I was executing looks like this: ADD JAR /home/natty/hive-test-case/hive-json-serde-0.2.jar; CREATE TABLE bar ( id INT, integers ARRAY<INT>, datum STRING ) ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.JsonSerde'; LOAD DATA LOCAL INPATH '/home/natty/sample_data/json.sample' OVERWRITE INTO TABLE bar; SELECT * FROM bar; The data I loaded in looks like this: { "id": 1, "integers": [ 1, 2, 3 ], "datum": "hello" }, When the "SELECT * FROM bar" query executes, it returns with a failure: hive> ADD JAR /home/natty/hive-test-case/hive-json-serde-0.2.jar; Added /home/natty/hive-test-case/hive-json-serde-0.2.jar to class path Added resource: /home/natty/hive-test-case/hive-json-serde-0.2.jar hive> SELECT * FROM bar; OK Failed with exception java.io.IOException:java.lang.ClassCastException: org.json.JSONArray cannot be cast to [Ljava.lang.Object; Time taken: 2.335 seconds Now, this alone doesn't bother me. What bothers me is that, if I look at the log file, I see the following exception: 2012-08-03 13:12:11,407 ERROR CliDriver (SessionState.java:printError(380)) - Failed with exception java.io.IOException:java.lang.ClassCastException: org.json.JSONArray cannot be cast to [Ljava.lang.Object; java.io.IOException: java.lang.ClassCastException: org.json.JSONArray cannot be cast to [Ljava.lang.Object; at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:173) at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1383) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:266) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:212) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:671) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:554) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:208) Caused by: java.lang.ClassCastException: org.json.JSONArray cannot be cast to [Ljava.lang.Object; at org.apache.hadoop.hive.serde2.objectinspector.StandardListObjectInspector.getList(StandardListObjectInspector.java:98) at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:287) at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:213) at org.apache.hadoop.hive.serde2.DelimitedJSONSerDe.serializeField(DelimitedJSONSerDe.java:59) at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:365) at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:163) ... 11 more Note that this exception indicates that Hive is executing code for the DelimitedJSONSerDe, rather than the one that I specified (JsonSerde from the jar file). Seems incorrect. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira