[ 
https://issues.apache.org/jira/browse/IMPALA-13725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18040608#comment-18040608
 ] 

ASF subversion and git services commented on IMPALA-13725:
----------------------------------------------------------

Commit fdad9d32041a736108b876704bd0354090a88d29 in impala's branch 
refs/heads/master from Noemi Pap-Takacs
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=fdad9d320 ]

IMPALA-13725: Add Iceberg table repair functionalities

In some cases users delete files directly from storage without
going through the Iceberg API, e.g. they remove old partitions.

This corrupts the table, and makes queries that try to read the
missing files fail.
This change introduces a repair statement that deletes the
dangling references of missing files from the metadata.
Note that the table cannot be repaired if there are missing
delete files because Iceberg's DeleteFiles API which is used
to execute the operation allows removing only data files.

Testing:
 - E2E
   - HDFS
   - S3, Ozone
 - analysis

Change-Id: I514403acaa3b8c0a7b2581d676b82474d846d38e
Reviewed-on: http://gerrit.cloudera.org:8080/23512
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>


> Add table repair functionalities
> --------------------------------
>
>                 Key: IMPALA-13725
>                 URL: https://issues.apache.org/jira/browse/IMPALA-13725
>             Project: IMPALA
>          Issue Type: Improvement
>            Reporter: Zoltán Borók-Nagy
>            Assignee: Noémi Pap-Takács
>            Priority: Major
>              Labels: impala-iceberg
>
> In some cases users delete files directly from storage without going through 
> the Iceberg API, e.g. they remove old partitions
> This corrupts a table for good, so we could add a command that removes the 
> missing files.
> Similar functionality in Spark is in progress: 
> https://github.com/apache/iceberg/pull/12106



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to