Csaba Ringhofer has uploaded a new patch set (#8). ( http://gerrit.cloudera.org:8080/23109 )
Change subject: IMPALA-14189: Cleanup subdirectories in truncate/insert overwrite ...................................................................... IMPALA-14189: Cleanup subdirectories in truncate/insert overwrite If an external table contains data files in subdirectories, and recursive listing is enabled, Impala considers the files in the subdirectories as part of the table. However, currently INSERT OVERWRITE and TRUNCATE do not always delete these files, leading to data corruption. 1. TRUNCATE Currently TRUNCATE can be run in two different ways: - if the table is being replicated, the HMS api is used - otherwise catalogd deletes the files itself. Two differences between these methods are: - calling HMS leads to an ALTER_TABLE event - calling HMS leads to recursive delete while catalogd only deletes files directly in the partition/table directory. This commit solves this problem by always using the HMS api for TRUNCATE operations. 2. INSERT OVERWRITE Before this change, for unpartitioned external tables, only top-level data files were deleted and data files in subdirectories (whether hidden, ignored or normal) were kept. After this change, directories are also deleted in addition to (non-hidden) data files, with the exception of hidden and ignored directories. (Note: for ignored directories, see --ignored_dir_prefix_list). Testing: - extended the tests in test_recursive_listing.py::TestRecursiveListing to include both cases Change-Id: Ib3ee6cba3a4f41ad9997d0d4f45e1d28af36b72b --- M be/src/runtime/dml-exec-state.cc M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M tests/metadata/test_recursive_listing.py 3 files changed, 106 insertions(+), 60 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/09/23109/8 -- To view, visit http://gerrit.cloudera.org:8080/23109 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ib3ee6cba3a4f41ad9997d0d4f45e1d28af36b72b Gerrit-Change-Number: 23109 Gerrit-PatchSet: 8 Gerrit-Owner: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Daniel Becker <daniel.bec...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>