[ 
https://issues.apache.org/jira/browse/IMPALA-14224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18005285#comment-18005285
 ] 

ASF subversion and git services commented on IMPALA-14224:
----------------------------------------------------------

Commit 9f12714d1cc7830c6ebb1759902facf6a298acbc in impala's branch 
refs/heads/master from Csaba Ringhofer
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=9f12714d1 ]

IMPALA-14224: Cleanup subdirectories in TRUNCATE

If an external table contains data files in subdirectories, and
recursive listing is enabled, Impala considers the files in the
subdirectories as part of the table. However, currently INSERT OVERWRITE
and TRUNCATE do not always delete these files, leading to data
corruption.

This change takes care of TRUNCATE.

Currently TRUNCATE can be run in two different ways:
 - if the table is being replicated, the HMS api is used
 - otherwise catalogd deletes the files itself.
Two differences between these methods are:
 - calling HMS leads to an ALTER_TABLE event
 - calling HMS leads to recursive delete while catalogd only
   deletes files directly in the partition/table directory.

This commit introduces the '--truncate_external_tables_with_hms' startup
flag, with default value 'true'. If this flag is set to true, Impala
always uses the HMS api for TRUNCATE operations.

Note that HMS always deletes stats on TRUNCATE, so setting the
DELETE_STATS_IN_TRUNCATE query option to false is not supported if
'--truncate_external_tables_with_hms' is set to true: an exception is
thrown.

Testing:
 - extended the tests in test_recursive_listing.py::TestRecursiveListing
   to include TRUNCATE
 - Moved tests with DELETE_STATS_IN_TRUNCATE=0 from truncate-table.test
   to truncate-table-no-delete-stats.test, which is run in a new custom
   cluster test (custom_cluster/test_no_delete_stats_in_truncate.py).

Change-Id: Ic0fcc6cf1eca8a0bcf2f93dbb61240da05e35519
Reviewed-on: http://gerrit.cloudera.org:8080/23166
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>


> Cleanup subdirectories in TRUNCATE
> ----------------------------------
>
>                 Key: IMPALA-14224
>                 URL: https://issues.apache.org/jira/browse/IMPALA-14224
>             Project: IMPALA
>          Issue Type: Sub-task
>            Reporter: Daniel Becker
>            Assignee: Daniel Becker
>            Priority: Critical
>
> This issue tracks the problem described in IMPALA-14189 for TRUNCATE. See 
> parent issue for more details.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to