[
https://issues.apache.org/jira/browse/IMPALA-11503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Boglarka Egyed reassigned IMPALA-11503:
---------------------------------------
Assignee: Peter Rozsa
> Dropping files of Iceberg table in HiveCatalog will cause DROP TABLE to fail
> ----------------------------------------------------------------------------
>
> Key: IMPALA-11503
> URL: https://issues.apache.org/jira/browse/IMPALA-11503
> Project: IMPALA
> Issue Type: Bug
> Components: Frontend
> Affects Versions: Impala 4.1.0
> Reporter: Gabor Kaszab
> Assignee: Peter Rozsa
> Priority: Major
> Labels: iceberg, impala-iceberg
>
> When the files of n Iceberg table are dropped then a DROP TABLE will result
> in an error while the table will still show up in SHOW TABLES
> Here are the steps to repro:
> 1) Run from Impala-shell
> {code:java}
> DROP DATABASE IF EXISTS `drop_incomplete_table2` CASCADE;
> CREATE DATABASE `drop_incomplete_table2`;
> CREATE TABLE drop_incomplete_table2.iceberg_tbl (i int) stored as iceberg;
> INSERT INTO drop_incomplete_table2.iceberg_tbl VALUES (1), (2), (3); {code}
> 2) Drop the folder of the table with hdfs dfs
> {code:java}
> hdfs dfs -rm -r
> hdfs://localhost:20500/test-warehouse/drop_incomplete_table2.db/iceberg_tbl
> {code}
> 3) Try to drop the table from Impala-shell
> {code:java}
> DROP TABLE drop_incomplete_table2.iceberg_tbl;
> {code}
> This results in the following error:
> {code:java}
> ERROR: NotFoundException: Failed to open input stream for file:
> hdfs://localhost:20500/test-warehouse/drop_incomplete_table2.db/iceberg_tbl/metadata/00001-e2568132-d74d-44c2-9b7f-8838453e5944.metadata.json
> CAUSED BY: FileNotFoundException: File does not exist:
> /test-warehouse/drop_incomplete_table2.db/iceberg_tbl/metadata/00001-e2568132-d74d-44c2-9b7f-8838453e5944.metadata.json
> at
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:87)
> at
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:77)
> at
> org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getBlockLocations(FSDirStatAndListingOp.java:159)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:2040)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:737)
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:454)
> at
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:533)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:989)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:917)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2894)CAUSED BY:
> RemoteException: File does not exist:
> /test-warehouse/drop_incomplete_table2.db/iceberg_tbl/metadata/00001-e2568132-d74d-44c2-9b7f-8838453e5944.metadata.json
> at
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:87)
> at
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:77)
> at
> org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getBlockLocations(FSDirStatAndListingOp.java:159)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:2040)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:737)
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:454)
> at
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:533)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:989)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:917)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2894) {code}
> While table is still there in show tables output even after an invalidate
> metadata.
> Note, it's important for the repro to execute some SQL against the newly
> created table to load it in Impala. In this case I used an INSERT INTO but
> e.g. an ALTER TABLE would also be good. Apparently, when the table is
> "incomplete" (this is the state right after running CREATE TABLE) this works
> fine but not if the table is loaded.
> The suspicious part of code is in StmtMetadataLoader.loadTables() and
> getMissingTables() where there is a distinction between loaded and Incomplete
> tables.
> [https://github.com/apache/impala/blob/2f74e956aa10db5af6a7cdc47e2ad42f63d5030f/fe/src/main/java/org/apache/impala/analysis/StmtMetadataLoader.java#L196]
>
> Note2, the issue is quite similar to
> https://issues.apache.org/jira/browse/IMPALA-11502 but here the repro steps
> and the error is somewhat different.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]