[ https://issues.apache.org/jira/browse/HIVE-26929?focusedWorklogId=843977&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-843977 ]
ASF GitHub Bot logged work on HIVE-26929: ----------------------------------------- Author: ASF GitHub Bot Created on: 07/Feb/23 04:34 Start Date: 07/Feb/23 04:34 Worklog Time Spent: 10m Work Description: ayushtkn merged PR #4011: URL: https://github.com/apache/hive/pull/4011 Issue Time Tracking ------------------- Worklog Id: (was: 843977) Time Spent: 40m (was: 0.5h) > Iceberg: Allow creating iceberg tables without column definition when > 'metadata_location' tblproperties is set. > --------------------------------------------------------------------------------------------------------------- > > Key: HIVE-26929 > URL: https://issues.apache.org/jira/browse/HIVE-26929 > Project: Hive > Issue Type: Improvement > Components: Iceberg integration > Reporter: Dharmik Thakkar > Assignee: Ayush Saxena > Priority: Minor > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > Allow creating iceberg tables without column definition when > 'metadata_location' tblproperties is set. > Iceberg supports pointing to external metadata.json file to infer table > schema. Irrespective of the schema defined as part of create table statement > the metadata.json is used to create table. We should allow creating table > without column definition in case the metadata_location is defined in > tblproperties. > {code:java} > create table test_meta (id int, name string, cgpa decimal) stored by iceberg > stored as orc; > describe formatted test_meta; > create table test_meta_copy(id int) stored by iceberg > tblproperties('metadata_location'='s3a://s3bucket/warehouse/tablespace/external/hive/iceberg_test_db_hive.db/test_meta/metadata/00000-7dfd7602-f5e1-4473-97cb-79377d358aa3.metadata.json');{code} > As a result of above SQL we get test_meta_copy with same schema as test_meta > irrespective of the columns specified in create table statement. > |{color:#000000}*col_name*{color}|{color:#000000}*data_type*{color}| > |{color:#000000}*id*{color}|{color:#000000}int{color}| > |{color:#000000}*name*{color}|{color:#000000}string{color}| > |{color:#000000}*cgpa*{color}|{color:#000000}decimal(10,0){color}| > | |{color:#000000}NULL{color}| > |{color:#000000}*# Detailed Table > Information*{color}|{color:#000000}NULL{color}| > |{color:#000000}*Database:* > {color}|{color:#000000}iceberg_test_db_hive{color}| > |{color:#000000}*OwnerType: * {color}|{color:#000000}USER > {color}| > |{color:#000000}*Owner: * {color}|{color:#000000}hive > {color}| > |{color:#000000}*CreateTime:* {color}|{color:#000000}Tue Jan 10 > 21:49:08 UTC 2023{color}| > |{color:#000000}*LastAccessTime:* {color}|{color:#000000}Fri Dec 12 > 21:41:41 UTC 1969{color}| > |{color:#000000}*Retention: * > {color}|{color:#000000}2147483647{color}| > |{color:#000000}*Location:* > {color}|{color:#000000}+s3a://s3bucket/warehouse/tablespace/external/hive/iceberg_test_db_hive.db/test_meta+{color}| > |{color:#000000}*Table Type:* {color}|{color:#000000}EXTERNAL_TABLE > {color}| > |{color:#000000}*Table Parameters:*{color}|{color:#000000}NULL{color}| > | |{color:#000000}EXTERNAL {color}| > | |{color:#000000}bucketing_version {color}| > | |{color:#000000}engine.hive.enabled{color}| > | |{color:#000000}metadata_location {color}| > | |{color:#000000}numFiles {color}| > | |{color:#000000}numRows {color}| > | |{color:#000000}rawDataSize {color}| > | |{color:#000000}serialization.format{color}| > | |{color:#000000}storage_handler {color}| > | |{color:#000000}table_type {color}| > | |{color:#000000}totalSize {color}| > | |{color:#000000}transient_lastDdlTime{color}| > | |{color:#000000}uuid {color}| > | |{color:#000000}write.format.default{color}| > | |{color:#000000}NULL{color}| > |{color:#000000}*# Storage Information*{color}|{color:#000000}NULL{color}| > |{color:#000000}*SerDe Library: * > {color}|{color:#000000}org.apache.iceberg.mr.hive.HiveIcebergSerDe{color}| > |{color:#000000}*InputFormat: * > {color}|{color:#000000}org.apache.iceberg.mr.hive.HiveIcebergInputFormat{color}| > |{color:#000000}*OutputFormat:* > {color}|{color:#000000}org.apache.iceberg.mr.hive.HiveIcebergOutputFormat{color}| > |{color:#000000}*Compressed:* {color}|{color:#000000}No > {color}| > |{color:#000000}*Sort Columns:* {color}|{color:#000000}[] > {color}| > However if we skip passing column definition the query fails > {code:java} > create table test_meta_copy2 stored by iceberg > tblproperties('metadata_location'='s3a://qe-s3-bucket-weekly-dj5h-dwx-external/clusters/env-dqdj5h/warehouse-1673341391-kkzh/warehouse/tablespace/external/hive/iceberg_test_db_hive.db/test_meta/metadata/00000-7dfd7602-f5e1-4473-97cb-79377d358aa3.metadata.json');{code} > error > {code:java} > INFO : Compiling > command(queryId=hive_20230110220019_94ffafef-f531-4532-a07c-0e46e3879f19): > create table test_meta_copy2 stored by iceberg > tblproperties('metadata_location'='s3a://qe-s3-bucket-weekly-dj5h-dwx-external/clusters/env-dqdj5h/warehouse-1673341391-kkzh/warehouse/tablespace/external/hive/iceberg_test_db_hive.db/test_meta/metadata/00000-7dfd7602-f5e1-4473-97cb-79377d358aa3.metadata.json') > INFO : Semantic Analysis Completed (retrial = false) > INFO : Created Hive schema: Schema(fieldSchemas:null, properties:null) > INFO : Completed compiling > command(queryId=hive_20230110220019_94ffafef-f531-4532-a07c-0e46e3879f19); > Time taken: 0.019 seconds > INFO : Executing > command(queryId=hive_20230110220019_94ffafef-f531-4532-a07c-0e46e3879f19): > create table test_meta_copy2 stored by iceberg > tblproperties('metadata_location'='s3a://qe-s3-bucket-weekly-dj5h-dwx-external/clusters/env-dqdj5h/warehouse-1673341391-kkzh/warehouse/tablespace/external/hive/iceberg_test_db_hive.db/test_meta/metadata/00000-7dfd7602-f5e1-4473-97cb-79377d358aa3.metadata.json') > INFO : Starting task [Stage-0:DDL] in serial mode > ERROR : Failed > org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: > MetaException(message:org.apache.hadoop.hive.serde2.SerDeException Please > provide an existing table or a valid schema) > at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:1361) > ~[hive-exec-3.1.3000.2022.0.13.0-72.jar:3.1.3000.2022.0.13.0-72] > at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:1366) > ~[hive-exec-3.1.3000.2022.0.13.0-72.jar:3.1.3000.2022.0.13.0-72] > at > org.apache.hadoop.hive.ql.ddl.table.create.CreateTableOperation.createTableNonReplaceMode(CreateTableOperation.java:158) > ~[hive-exec-3.1.3000.2022.0.13.0-72.jar:3.1.3000.2022.0.13.0-72] > at > org.apache.hadoop.hive.ql.ddl.table.create.CreateTableOperation.execute(CreateTableOperation.java:116) > ~[hive-exec-3.1.3000.2022.0.13.0-72.jar:3.1.3000.2022.0.13.0-72] > at org.apache.hadoop.hive.ql.ddl.DDLTask.execute(DDLTask.java:84) > ~[hive-exec-3.1.3000.2022.0.13.0-72.jar:3.1.3000.2022.0.13.0-72] > at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213) > ~[hive-exec-3.1.3000.2022.0.13.0-72.jar:3.1.3000.2022.0.13.0-72] > at > org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105) > ~[hive-exec-3.1.3000.2022.0.13.0-72.jar:3.1.3000.2022.0.13.0-72] > at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:360) > ~[hive-exec-3.1.3000.2022.0.13.0-72.jar:3.1.3000.2022.0.13.0-72] > at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:333) > ~[hive-exec-3.1.3000.2022.0.13.0-72.jar:3.1.3000.2022.0.13.0-72] > at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:250) > ~[hive-exec-3.1.3000.2022.0.13.0-72.jar:3.1.3000.2022.0.13.0-72] > at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:111) > ~[hive-exec-3.1.3000.2022.0.13.0-72.jar:3.1.3000.2022.0.13.0-72] > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:809) > ~[hive-exec-3.1.3000.2022.0.13.0-72.jar:3.1.3000.2022.0.13.0-72] > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:547) > ~[hive-exec-3.1.3000.2022.0.13.0-72.jar:3.1.3000.2022.0.13.0-72] > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:541) > ~[hive-exec-3.1.3000.2022.0.13.0-72.jar:3.1.3000.2022.0.13.0-72] > at > org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:166) > ~[hive-exec-3.1.3000.2022.0.13.0-72.jar:3.1.3000.2022.0.13.0-72] > at > org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:232) > ~[hive-service-3.1.3000.2022.0.13.0-72.jar:3.1.3000.2022.0.13.0-72] > at > org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:89) > ~[hive-service-3.1.3000.2022.0.13.0-72.jar:3.1.3000.2022.0.13.0-72] > at > org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:338) > ~[hive-service-3.1.3000.2022.0.13.0-72.jar:3.1.3000.2022.0.13.0-72] > at java.security.AccessController.doPrivileged(Native Method) ~[?:?] > at javax.security.auth.Subject.doAs(Subject.java:423) ~[?:?] > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899) > ~[hadoop-common-3.1.1.7.2.15.4-6.jar:?] > at > org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:358) > ~[hive-service-3.1.3000.2022.0.13.0-72.jar:3.1.3000.2022.0.13.0-72] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) ~[?:?] > at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) ~[?:?] > at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > ~[?:?] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > ~[?:?] > at java.lang.Thread.run(Thread.java:829) [?:?] > Caused by: java.lang.RuntimeException: > MetaException(message:org.apache.hadoop.hive.serde2.SerDeException Please > provide an existing table or a valid schema) > at > org.apache.hadoop.hive.ql.metadata.Table.getDeserializerFromMetaStore(Table.java:349) > ~[hive-exec-3.1.3000.2022.0.13.0-72.jar:3.1.3000.2022.0.13.0-72] > at > org.apache.hadoop.hive.ql.metadata.Table.getDeserializer(Table.java:329) > ~[hive-exec-3.1.3000.2022.0.13.0-72.jar:3.1.3000.2022.0.13.0-72] > at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:1307) > ~[hive-exec-3.1.3000.2022.0.13.0-72.jar:3.1.3000.2022.0.13.0-72] > ... 28 more > Caused by: org.apache.hadoop.hive.metastore.api.MetaException: > org.apache.hadoop.hive.serde2.SerDeException Please provide an existing table > or a valid schema > at > org.apache.hadoop.hive.metastore.HiveMetaStoreUtils.getDeserializer(HiveMetaStoreUtils.java:123) > ~[hive-exec-3.1.3000.2022.0.13.0-72.jar:3.1.3000.2022.0.13.0-72] > at > org.apache.hadoop.hive.metastore.HiveMetaStoreUtils.getDeserializer(HiveMetaStoreUtils.java:80) > ~[hive-exec-3.1.3000.2022.0.13.0-72.jar:3.1.3000.2022.0.13.0-72] > at > org.apache.hadoop.hive.ql.metadata.Table.getDeserializerFromMetaStore(Table.java:347) > ~[hive-exec-3.1.3000.2022.0.13.0-72.jar:3.1.3000.2022.0.13.0-72] > at > org.apache.hadoop.hive.ql.metadata.Table.getDeserializer(Table.java:329) > ~[hive-exec-3.1.3000.2022.0.13.0-72.jar:3.1.3000.2022.0.13.0-72] > at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:1307) > ~[hive-exec-3.1.3000.2022.0.13.0-72.jar:3.1.3000.2022.0.13.0-72] > ... 28 more > ERROR : DDLTask failed, DDL Operation: class > org.apache.hadoop.hive.ql.ddl.table.create.CreateTableOperation > org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: > MetaException(message:org.apache.hadoop.hive.serde2.SerDeException Please > provide an existing table or a valid schema) > at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:1361) > ~[hive-exec-3.1.3000.2022.0.13.0-72.jar:3.1.3000.2022.0.13.0-72] > at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:1366) > ~[hive-exec-3.1.3000.2022.0.13.0-72.jar:3.1.3000.2022.0.13.0-72] > at > org.apache.hadoop.hive.ql.ddl.table.create.CreateTableOperation.createTableNonReplaceMode(CreateTableOperation.java:158) > ~[hive-exec-3.1.3000.2022.0.13.0-72.jar:3.1.3000.2022.0.13.0-72] > at > org.apache.hadoop.hive.ql.ddl.table.create.CreateTableOperation.execute(CreateTableOperation.java:116) > ~[hive-exec-3.1.3000.2022.0.13.0-72.jar:3.1.3000.2022.0.13.0-72] > at org.apache.hadoop.hive.ql.ddl.DDLTask.execute(DDLTask.java:84) > ~[hive-exec-3.1.3000.2022.0.13.0-72.jar:3.1.3000.2022.0.13.0-72] > at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213) > ~[hive-exec-3.1.3000.2022.0.13.0-72.jar:3.1.3000.2022.0.13.0-72] > at > org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105) > ~[hive-exec-3.1.3000.2022.0.13.0-72.jar:3.1.3000.2022.0.13.0-72] > at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:360) > ~[hive-exec-3.1.3000.2022.0.13.0-72.jar:3.1.3000.2022.0.13.0-72] > at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:333) > ~[hive-exec-3.1.3000.2022.0.13.0-72.jar:3.1.3000.2022.0.13.0-72] > at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:250) > ~[hive-exec-3.1.3000.2022.0.13.0-72.jar:3.1.3000.2022.0.13.0-72] > at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:111) > ~[hive-exec-3.1.3000.2022.0.13.0-72.jar:3.1.3000.2022.0.13.0-72] > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:809) > ~[hive-exec-3.1.3000.2022.0.13.0-72.jar:3.1.3000.2022.0.13.0-72] > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:547) > ~[hive-exec-3.1.3000.2022.0.13.0-72.jar:3.1.3000.2022.0.13.0-72] > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:541) > ~[hive-exec-3.1.3000.2022.0.13.0-72.jar:3.1.3000.2022.0.13.0-72] > at > org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:166) > ~[hive-exec-3.1.3000.2022.0.13.0-72.jar:3.1.3000.2022.0.13.0-72] > at > org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:232) > ~[hive-service-3.1.3000.2022.0.13.0-72.jar:3.1.3000.2022.0.13.0-72] > at > org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:89) > ~[hive-service-3.1.3000.2022.0.13.0-72.jar:3.1.3000.2022.0.13.0-72] > at > org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:338) > ~[hive-service-3.1.3000.2022.0.13.0-72.jar:3.1.3000.2022.0.13.0-72] > at java.security.AccessController.doPrivileged(Native Method) ~[?:?] > at javax.security.auth.Subject.doAs(Subject.java:423) ~[?:?] > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899) > ~[hadoop-common-3.1.1.7.2.15.4-6.jar:?] > at > org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:358) > ~[hive-service-3.1.3000.2022.0.13.0-72.jar:3.1.3000.2022.0.13.0-72] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) ~[?:?] > at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) ~[?:?] > at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > ~[?:?] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > ~[?:?] > at java.lang.Thread.run(Thread.java:829) [?:?] > Caused by: java.lang.RuntimeException: > MetaException(message:org.apache.hadoop.hive.serde2.SerDeException Please > provide an existing table or a valid schema) > at > org.apache.hadoop.hive.ql.metadata.Table.getDeserializerFromMetaStore(Table.java:349) > ~[hive-exec-3.1.3000.2022.0.13.0-72.jar:3.1.3000.2022.0.13.0-72] > at > org.apache.hadoop.hive.ql.metadata.Table.getDeserializer(Table.java:329) > ~[hive-exec-3.1.3000.2022.0.13.0-72.jar:3.1.3000.2022.0.13.0-72] > at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:1307) > ~[hive-exec-3.1.3000.2022.0.13.0-72.jar:3.1.3000.2022.0.13.0-72] > ... 28 more > Caused by: org.apache.hadoop.hive.metastore.api.MetaException: > org.apache.hadoop.hive.serde2.SerDeException Please provide an existing table > or a valid schema > at > org.apache.hadoop.hive.metastore.HiveMetaStoreUtils.getDeserializer(HiveMetaStoreUtils.java:123) > ~[hive-exec-3.1.3000.2022.0.13.0-72.jar:3.1.3000.2022.0.13.0-72] > at > org.apache.hadoop.hive.metastore.HiveMetaStoreUtils.getDeserializer(HiveMetaStoreUtils.java:80) > ~[hive-exec-3.1.3000.2022.0.13.0-72.jar:3.1.3000.2022.0.13.0-72] > at > org.apache.hadoop.hive.ql.metadata.Table.getDeserializerFromMetaStore(Table.java:347) > ~[hive-exec-3.1.3000.2022.0.13.0-72.jar:3.1.3000.2022.0.13.0-72] > at > org.apache.hadoop.hive.ql.metadata.Table.getDeserializer(Table.java:329) > ~[hive-exec-3.1.3000.2022.0.13.0-72.jar:3.1.3000.2022.0.13.0-72] > at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:1307) > ~[hive-exec-3.1.3000.2022.0.13.0-72.jar:3.1.3000.2022.0.13.0-72] > ... 28 more > ERROR : FAILED: Execution Error, return code 40000 from > org.apache.hadoop.hive.ql.ddl.DDLTask. java.lang.RuntimeException: > MetaException(message:org.apache.hadoop.hive.serde2.SerDeException Please > provide an existing table or a valid schema) > INFO : Completed executing > command(queryId=hive_20230110220019_94ffafef-f531-4532-a07c-0e46e3879f19); > Time taken: 0.033 seconds > INFO : OK {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)