Csaba Ringhofer has uploaded a new patch set (#8). ( 
http://gerrit.cloudera.org:8080/23109 )

Change subject: IMPALA-14189: Cleanup subdirectories in truncate/insert 
overwrite
......................................................................

IMPALA-14189: Cleanup subdirectories in truncate/insert overwrite

If an external table contains data files in subdirectories, and
recursive listing is enabled, Impala considers the files in the
subdirectories as part of the table. However, currently INSERT OVERWRITE
and TRUNCATE do not always delete these files, leading to data
corruption.

1. TRUNCATE
Currently TRUNCATE can be run in two different ways:
 - if the table is being replicated, the HMS api is used
 - otherwise catalogd deletes the files itself.
Two differences between these methods are:
 - calling HMS leads to an ALTER_TABLE event
 - calling HMS leads to recursive delete while catalogd only
   deletes files directly in the partition/table directory.

This commit solves this problem by always using the HMS api for TRUNCATE
operations.

2. INSERT OVERWRITE
Before this change, for unpartitioned external tables, only top-level
data files were deleted and data files in subdirectories (whether
hidden, ignored or normal) were kept.

After this change, directories are also deleted in addition to
(non-hidden) data files, with the exception of hidden and ignored
directories. (Note: for ignored directories, see
--ignored_dir_prefix_list).

Testing:
 - extended the tests in test_recursive_listing.py::TestRecursiveListing
   to include both cases
Change-Id: Ib3ee6cba3a4f41ad9997d0d4f45e1d28af36b72b
---
M be/src/runtime/dml-exec-state.cc
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M tests/metadata/test_recursive_listing.py
3 files changed, 106 insertions(+), 60 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/09/23109/8
--
To view, visit http://gerrit.cloudera.org:8080/23109
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib3ee6cba3a4f41ad9997d0d4f45e1d28af36b72b
Gerrit-Change-Number: 23109
Gerrit-PatchSet: 8
Gerrit-Owner: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <daniel.bec...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>

Reply via email to