I figured out the problem. The JSON SerDe I wrote is not case sensitive, but the ORC and Parquet SerDes are case sensitive.
So this works: select ClientCode, Encounter.Number from parquet_tbl; but this does not: select clientcode, encounter.Number from parquet_tbl; -Michael On Thu, Apr 3, 2014 at 5:05 PM, <mpeters...@gmail.com> wrote: > Hi, > > I'm new to using Parquet and ORC files and I'm hitting a problem with > querying nested data. Can those files formats be used to query deeply > nested data? > > If yes, why I am getting an error with the SerDes for both of them? > > Here's the background: > > I'm starting from a JSON data file like this: > > { > "ClientCode": "ABC", > "JSONUpdateDtm": "200901011000", > "Encounter": { > "Number": "5555555-9999999", > "Patient": { > "PatientNumber": "987654321", > "SSN": "123-45-6789" > }, > "Payers": [ > { > "SequenceNumber": "1", > "Payer": "MC", > "Description": "Medicaid" > }, > { > "SequenceNumber": "2", > "Payer": "XYZ" > } > ] > } > } > > and I've created a Hive table with this schema using a JSON SerDe: > > ClientCode STRING, > Encounter STRUCT<Number:STRING, Patient:STRUCT<PatientNumber:STRING, > SSN:STRING>, Payers:ARRAY<STRUCT<Description:STRING, Payer:STRING, > SequenceNumber:STRING>>>, > JSONUpdateDtm STRING) > > I can issue this query just fine: > > > hive select clientcode, encounter.Number, encounter.patient.ssn from > json_tbl; > OK > DEF 4444-88888 444-45-4444 > ABC 5555555-9999999 123-45-6789 > > > I then created Parquet and ORCFile versions of this data set: > > CREATE TABLE parquet_tbl ( > ClientCode STRING, > Encounter STRUCT<Number:STRING, Patient:STRUCT<PatientNumber:STRING, > SSN:STRING>, Payers:ARRAY<STRUCT<Description:STRING, Payer:STRING, > SequenceNumber:STRING>>>, > JSONUpdateDtm STRING) > ROW FORMAT SERDE 'parquet.hive.serde.ParquetHiveSerDe' > STORED AS > INPUTFORMAT 'parquet.hive.DeprecatedParquetInputFormat' > OUTPUTFORMAT 'parquet.hive.DeprecatedParquetOutputFormat'; > > INSERT OVERWRITE TABLE parquet_tbl SELECT * from json_tbl; > > > > CREATE TABLE orc_tbl ( > ClientCode STRING, > Encounter STRUCT<Number:STRING, Patient:STRUCT<PatientNumber:STRING, > SSN:STRING>, Payers:ARRAY<STRUCT<Description:STRING, Payer:STRING, > SequenceNumber:STRING>>>, > JSONUpdateDtm STRING) > STORED AS orc; > > INSERT OVERWRITE TABLE orc_tbl SELECT * from json_tbl; > > > I can query these tables when I query two levels deep, but *not three. > Am I doing something wrong? Or do these data formats not support deeply > nested queries?* > > hive> select clientcode, encounter.Number from parquet_tbl; > OK > DEF 4444-88888 > ABC 5555555-9999999 > > hive> select clientcode, encounter.Number from orc_tbl; > OK > DEF 4444-88888 > ABC 5555555-9999999 > > > hive> select clientcode, encounter.Number, encounter.patient.ssn from > orc_tbl; > > Diagnostic Messages for this Task: > Error: java.lang.RuntimeException: Error in configuring object > at > org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) > at > org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) > at > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:425) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157) > Caused by: java.lang.reflect.InvocationTargetException > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) > ... 9 more > Caused by: java.lang.RuntimeException: Error in configuring object > at > org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) > at > org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) > at > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) > at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38) > ... 14 more > Caused by: java.lang.reflect.InvocationTargetException > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) > ... 17 more > Caused by: java.lang.RuntimeException: Map operator initialization failed > at > org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:134) > ... 22 more > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.hive.ql.exec.ExprNodeFieldEvaluator.initialize(ExprNodeFieldEvaluator.java:61) > at > org.apache.hadoop.hive.ql.exec.ExprNodeFieldEvaluator.initialize(ExprNodeFieldEvaluator.java:53) > at > org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:992) > at > org.apache.hadoop.hive.ql.exec.Operator.initEvaluatorsAndReturnStruct(Operator.java:1018) > at > org.apache.hadoop.hive.ql.exec.SelectOperator.initializeOp(SelectOperator.java:64) > at > org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:377) > at > org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:453) > at > org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:409) > at > org.apache.hadoop.hive.ql.exec.TableScanOperator.initializeOp(TableScanOperator.java:188) > at > org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:377) > at > org.apache.hadoop.hive.ql.exec.MapOperator.initializeOp(MapOperator.java:425) > at > org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:377) > at > org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:113) > ... 22 more > > > FAILED: Execution Error, return code 2 from > org.apache.hadoop.hive.ql.exec.mr.MapRedTask > > > > Thank you, > Michael >