Hi All, We have a use case where we have created a partition external table in hive 2.3.3 which is pointing to a parquet location where we have date level folder and on some days parquet was created by hive 2.1.1 and on some days it was created by using glue. Now when we trying to read this data, we are getting below error :-
Vertex failed, vertexName=Map 1, vertexId=vertex_1535191533874_0135_2_00, diagnostics=[Task failed, taskId=task_1535191533874_0135_2_00_000000, diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( failure ) : attempt_1535191533874_0135_2_00_000000_0:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row [Error getting row data with exception java.lang.UnsupportedOperationException: Cannot inspect org.apache.hadoop.hive.serde2.io.DateWritable at org.apache.hadoop.hive.ql.io.parquet.serde.primitive.ParquetStringInspector.getPrimitiveJavaObject(ParquetStringInspector.java:77) at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:247) at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:366) at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:202) at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:188) at org.apache.hadoop.hive.ql.exec.MapOperator.toErrorMessage(MapOperator.java:588) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:554) at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:86) at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:70) at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:419) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:185) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:168) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374) at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73) at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1886) at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61) at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) ] at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:211) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:168) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374) at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73) at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1886) at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61) at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row [Error getting row data with exception java.lang.UnsupportedOperationException: Cannot inspect org.apache.hadoop.hive.serde2.io.DateWritable at org.apache.hadoop.hive.ql.io.parquet.serde.primitive.ParquetStringInspector.getPrimitiveJavaObject(ParquetStringInspector.java:77) After some drill down i saw schema of columns inside both type of parquet file using parquet tool and found different data types for some column and it seems, for those columns where data type is same in both files, we are able to query those successfully using above external table but in case of a difference we are getting error so just wanted your suggestion how to come out of this? Processing all data with same engine(hive/glue) will be very costly for us as we have data for last 2-3 years. Please find below table which says data type of both file for some column along with data type in external table for same column and "test result" specifies whether i was able to read it or not. *Parquet made with Hive 2.1.1* *Parquet made with AWS Glue* *Final Hive Table for reading both of these* *Test Result* optional int32 action_date (DATE); optional binary action_date (UTF8); action_date : string Fail optional int64 user_id; optional int64 user_id; user_id : bigint Pass optional binary tracking_type (UTF8); required binary tracking_type (UTF8); tracking_type : string Fail optional int32 game_number; optional int64 game_number; game_number : bigint Pass Regards, Anup Tiwari