Hi Thai, Any links or examples for achieving this? Since I do not have much idea of this.
On Thu, 30 Aug 2018 20:08 Thai Bui, <blquyt...@gmail.com> wrote: > Another option is to implement a custom ParquetInputFormat extending the > current Hive MR Parquet format and handle schema coersion at the input > split/record reader level. This would be more involving but guarantee to > work if you could add auxiliary jars to your Hive cluster. > > On Wed, Aug 29, 2018 at 8:06 AM Anup Tiwari <anupsdtiw...@gmail.com> > wrote: > >> Hi All, >> >> We have a use case where we have created a partition external table in >> hive 2.3.3 which is pointing to a parquet location where we have date level >> folder and on some days parquet was created by hive 2.1.1 and on some days >> it was created by using glue. Now when we trying to read this data, we are >> getting below error :- >> >> Vertex failed, vertexName=Map 1, vertexId=vertex_1535191533874_0135_2_00, >> diagnostics=[Task failed, taskId=task_1535191533874_0135_2_00_000000, >> diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( >> failure ) : >> attempt_1535191533874_0135_2_00_000000_0:java.lang.RuntimeException: >> java.lang.RuntimeException: >> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while >> processing row [Error getting row data with exception >> java.lang.UnsupportedOperationException: Cannot inspect >> org.apache.hadoop.hive.serde2.io.DateWritable >> at >> org.apache.hadoop.hive.ql.io.parquet.serde.primitive.ParquetStringInspector.getPrimitiveJavaObject(ParquetStringInspector.java:77) >> at >> org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:247) >> at >> org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:366) >> at >> org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:202) >> at >> org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:188) >> at >> org.apache.hadoop.hive.ql.exec.MapOperator.toErrorMessage(MapOperator.java:588) >> at >> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:554) >> at >> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:86) >> at >> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:70) >> at >> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:419) >> at >> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:185) >> at >> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:168) >> at >> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374) >> at >> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73) >> at >> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61) >> at java.security.AccessController.doPrivileged(Native Method) >> at javax.security.auth.Subject.doAs(Subject.java:422) >> at >> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1886) >> at >> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61) >> at >> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37) >> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) >> at java.util.concurrent.FutureTask.run(FutureTask.java:266) >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) >> at java.lang.Thread.run(Thread.java:748) >> ] >> at >> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:211) >> at >> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:168) >> at >> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374) >> at >> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73) >> at >> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61) >> at java.security.AccessController.doPrivileged(Native Method) >> at javax.security.auth.Subject.doAs(Subject.java:422) >> at >> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1886) >> at >> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61) >> at >> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37) >> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) >> at java.util.concurrent.FutureTask.run(FutureTask.java:266) >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) >> at java.lang.Thread.run(Thread.java:748) >> Caused by: java.lang.RuntimeException: >> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while >> processing row [Error getting row data with exception >> java.lang.UnsupportedOperationException: Cannot inspect >> org.apache.hadoop.hive.serde2.io.DateWritable >> at >> org.apache.hadoop.hive.ql.io.parquet.serde.primitive.ParquetStringInspector.getPrimitiveJavaObject(ParquetStringInspector.java:77) >> >> >> After some drill down i saw schema of columns inside both type of parquet >> file using parquet tool and found different data types for some column and >> it seems, for those columns where data type is same in both files, we are >> able to query those successfully using above external table but in case of >> a difference we are getting error so just wanted your suggestion how to >> come out of this? >> >> Processing all data with same engine(hive/glue) will be very costly for >> us as we have data for last 2-3 years. Please find below table which says >> data type of both file for some column along with data type in external >> table for same column and "test result" specifies whether i was able to >> read it or not. >> >> *Parquet made with Hive 2.1.1* *Parquet made with AWS Glue* *Final Hive >> Table for reading both of these* *Test Result* >> optional int32 action_date (DATE); optional binary action_date (UTF8); >> action_date >> : string Fail >> optional int64 user_id; optional int64 user_id; user_id : bigint Pass >> optional binary tracking_type (UTF8); required binary tracking_type >> (UTF8); tracking_type : string Fail >> optional int32 game_number; optional int64 game_number; game_number : >> bigint Pass >> >> Regards, >> Anup Tiwari >> > -- > Thai >